JDS Academy - course thematics

The Junior Data Scientist Academy

This online course provided (and still provides) a comprehensive introduction to theory and practices of a Data Scientist tools. It included data analysis and predictive modeling, blending practical tools with foundational concepts. Over several weeks, we progressed from gathering and automating data workflows to leveraging SQL for data organization and segmentation.

Business-focused topics like revenue segmentation, key metrics, and visualization techniques are explored, followed by advanced analytical methods including funnel analysis and cohort studies. The final weeks introduced Python’s Pandas library for data manipulation, regression techniques for prediction, and beginner-friendly machine learning (ML) concepts such as classification. By the end, participants gain hands-on skills to analyze data and derive actionable insights.

Prerequisites were almost nothing in regard Linux (Bash) or Python and Data Science at all, so beginners could join. If someone had some practice in those - like me - could find easier the solution for tricky tasks. There was an intro before someone became eligible to  join the course: a task to create an online Linux server (Linode), do some settings, then grant access to the tutor to verify the result and ... this might have been a steep learning curve for real beginners. 

  1. WEEK #0 |
    •  Linux installation on Workstation and on Linode (web)server /
    •  Basic Bash commands (Linux) / 
    • Setup Python environment (3.x) / 
    • Basic Python with practice / 
    • Python module imports, functions 
  2. WEEK #1 | 
    • Welcome / 
    • Get the data / 
    • ETL bash / 
    • ETL python / 
    • Automate
  3. WEEK #2 | 
    • Put your data into SQL (python + SQL) / 
    • Automate the SQL load / 
    • Verify and analyze (SQL) - basic SQL queries / 
    • Data analitical and  modifying SQL queries / 
    • Complex analytics - JOINs / 
    • Database/Table modifications 
  4. WEEK #3 | 
    • Segments / 
    • Segmentation (Revenue) / 
    • Business metrics / 
    • Visualize (Matplotlib, PowerBI)
  5. WEEK #4 | 
    • Funnels / 
    • Cohort Analysis / 
    • Cohort Analysis 2
  6. WEEK #5 | 
    • Pandas Intro / 
    • Prediction with Regression / 
    • Simple Machine Learning: Classification / 
    • Further Classification 
  7.  WEEK #6 Presentations, One-on-one meeting(s), summary, takeaways, ...

During the SQL part of the course we practiced all 4 main forms of SQL queries.

  • Database Modification (DM): commands like INSERT, UPDATE, DELETE, and ALTER.
  • Data Analysis (DA): queries used to analyze and retrieve data, such as SELECT with analytical functions (SUM, AVG, GROUP BY, etc.).
  • Database Creation (DC): commands that create new database objects, like CREATE TABLE, CREATE VIEW, and CREATE INDEX.
  • Database Control (DCn): commands that manage database access and constraints, like GRANT, REVOKE, and SET.

JDS Academy - A/B testing

A/B testing icon
 A/B testing is a powerful method used to compare two versions of a webpage, ad, or product feature to determine which one performs better. In the context of the JDS course containing online teaching andpractical parts about the basics of A/B testing, I could explore how to design experiments, split an audience into groups, and measure key performance metrics like click-through rates or conversions. The course covered critical concepts such as formulating hypotheses, ensuring statistical significance, and interpreting results to make data-driven decisions. Practical examples and hands-on exercises were provided to build confidence (above 95% 😁 ) in applying A/B testing to real-world scenarios. 

Here is a not-complete list of the wide range of topics:

  • What is A/B testing exactly? What can you A/B test?
  • The correlation vs. causation issue
  • A/B testing: when/why/what/how/how long?
  • The limitations of A/B testing
  • The four steps of executing an A/B test
  • 80%? 95%? 99%? What's the right confidence level?
  • An intuitive way of interpreting the importance of statistical significance
  • Calculating the required sample size before the test
  • Segmentation, filtering
  • Important A/B Testing mindset: Conversion Rate Optimization or Research?
  • Implementation and typical mistakes
  • Typical mistakes while running an A/B test
  • Setting a hypothesis
  • Avoiding website flickering in an A/B test
  • The change-one-thing-at-a-time myth, multivariate tests, A/B/n tests and unusual audience splits
  • Tool demo (Google Optimize)
  • What to do after evaluating your experiment
  • Statistical significance and the certainty of your results
  • 1) Traffic allocation: sending visitors to the old/new landing pages (with JavaScript)
  • 2) Data collection and visualization (Bash + SQL + Google Data Studio)
  • 3) Evaluation: significance calculations (using Python)
  • ACTION: Filter out the bad A/B test ideas from your backlog!
  • ACTION: Review your A/B testing backlog!
  • ACTION: The A/B tests you've run so far -- and the A/B tests you want to run in the future
  • EXTRA: Hand-picked A/B Testing case studies
  • EXTRA: online calculators (statistical significance and sample size)
  • ACTION: Run your first research round!
  • ACTION: Try out the simplest ever A/B test!
  • ACTION: your own knowledge base!
  • EXTRA: A/B Testing Hypothesis Form
  • Summary & The right mindset to be successful with A/B testing
updated with photo: Dec.2024

JDS Academy - Accidents database analysis

JDS ACADEMY - Accidents database analysis



timing of online course: March 2020 - June 2020
website: https://data36.com/
useful: Yes! Worth to complete (in Hungarian mainly).
pay-service: yes

During the course, homework had to be done with a due date, regularly in 5 days. One case an European international Accident database was in the center of a teamwork and we had to elaborate the questions and find the answers, summarize and orally present the results for the teachers of the course.

The database data was given in csv file format, including more than 10 000 entries, including AccidentID, Country, Damage (cost), nature of being reported (toward any authorities or insurance companies) and so on.

The data exploration and queries were made with the help of PostgreSQL and extracted data were plotted either in Excel or Google Sheets, finally the presentation was created in MS Power Point.

The Accidents database analysis in PDF form. English version was translated and re-styled by me.

The Jupyter Notebook and all related files on Github - Accidents analysis.

Snowflake universe, part #6 - Forecasting2

Forecasting with built-in ML module Further posts in  Snowflake  topic SnowFlake universe, part#1 SnowFlake, part#2 SnowPark Notebook Snow...