OnlineRetail dataset, Analysis#2 - SQL

DataCamp - OnlineRetail dataset,  Analysis #2 - SQL with Notebook

Previously I analyzed the below presented dataset with python with the help of the in-built visualization of the Notebook offered on DataCamp website and also using matplotlib for special purposes. I created another analysis with and MS Power BI.

I have elaborated the answers for the so called Online Retail dataset with SQL queries and zipped all required files (data and explanation as csv files, notebook as ipynb file) GitHub. If you have no access to any Notebook that is compatible* with the ipynb file then a pdf version is available here: SQL queries (0.1 MB) or the complete python and SQL solutions without the coding part Python+SQL (12 MB)

My solution includes additional details and goes beyond the required questions.

The Online retail project (original data source) is presented by DataCamp in steps:
  1. Problem definition of E-Commerce Data 

  2. Data Dictionary - explaining the content, the data types and the meaning of specific values or signs


  3. "Don't know where to start?" session defines Exploration, Analysis, and Visualisation challenges:
    1. Explore: Negative order quantities indicate returns. Which products have been returned the most?
    2. Visualize: Create a plot visualizing the profits earned from UK customers weekly, monthly.
    3. Analyze: Are order sizes from countries outside the United Kingdom significantly larger than orders from inside the United Kingdom?

With SQL visualization, and statistical analysis is not possible, but the data can be extracted for further analysis. DataCamp website in-built Notebook functionality can be of help in case of visualization, but t-probe statistics is still not doable on a pure SQL basis.

Duck DB's SQL is a bit different from other SQL languages so loading the data already required a specific method: 
Loading data from csv - Duck DB method

Searching for the most returned items during the whole period defined by the dataset:
Most returned items - SQL query

Preparing data for profit  (in UK) visualization at different timescales:

See the Python-based study or the Power BI solution on the same topic.

* Compatible Notebook readers/editors: DataCamp online Notebook (requires registration) Please note that Jupyter Notebook versions of any kind are not compatible automatically with any kind of SQL interpreters. There are ways (search online for solutions), but I recommend reading the pdf version provided above.

DC OR editable (for myself)

No comments:

Post a Comment

Snowflake universe, part #6 - Forecasting2

Forecasting with built-in ML module Further posts in  Snowflake  topic SnowFlake universe, part#1 SnowFlake, part#2 SnowPark Notebook Snow...