OnlineRetail dataset, Analysis#1 - Python

DataCamp - Online Retail dataset,  Analysis #1 - Python supported by Notebook functionality

I have elaborated the answers for the so called Online Retail dataset and zipped all required files (data and explanation as csv files, notebook as ipynb file) GitHub. If you have no access to any Notebook that is compatible* with the ipynb file then a pdf version is available here: Analysis with Python (10 MB)

My solution includes additional details and goes beyond the required questions.

The Online retail dataset (original source) is presented by DataCamp in steps:

  1. Problem definition of E-Commerce Data 

  2. Data Dictionary - explaining the content, the data types and the meaning of specific values or signs


  3. "Don't know where to start?" session defines Exploration, Analysis, and Visualisation challenges:
    1. Explore: Negative order quantities indicate returns. Which products have been returned the most?
    2. Visualize: Create a plot visualizing the profits earned from UK customers weekly, monthly.
    3. Analyze: Are order sizes from countries outside the United Kingdom significantly larger than orders from inside the United Kingdom?
Here are some extracted figures.
Monthly income and repayment
The monthly income (Quantity * UnitPrice) and repayments (Quantity * UnitPrice). The December - January returns are following the October - December increased pucrhases. "End of the year" craziness. :) 

The 2 figures below are specific visualizations of the "How often each product is returned (frequency) and in what amount?" These help to perceive the outliers: the weakest and the best-sold (with lowest return rate) products in our portfolio.



The t-test result of the hypothesis "non-UK costumers purchase goods in significantly higher amount than UK customers".  Further details can be found in the above-mentioned Notebook or pdf files.


See the SQL-based study or the Power BI solution on the same topic.

* Compatible Notebook readers/editors: DataCamp online Notebook (requires registration) or Anaconda Jupyter Notebook (requires installation) or the online version of Jupyter

No comments:

Post a Comment

Snowflake universe, part #6 - Forecasting2

Forecasting with built-in ML module Further posts in  Snowflake  topic SnowFlake universe, part#1 SnowFlake, part#2 SnowPark Notebook Snow...