DataCamp - Online Retail dataset, Analysis #1 - Python supported by Notebook functionality

My solution includes additional details and goes beyond the required questions.
The Online retail dataset (original source) is presented by DataCamp in steps:
- Problem definition of E-Commerce Data
- Data Dictionary - explaining the content, the data types and the meaning of specific values or signs
- "Don't know where to start?" session defines Exploration, Analysis, and Visualisation challenges:
- Explore: Negative order quantities indicate returns. Which products have been returned the most?
- Visualize: Create a plot visualizing the profits earned from UK customers weekly, monthly.
- Analyze: Are order sizes from countries outside the United Kingdom significantly larger than orders from inside the United Kingdom?
Here are some extracted figures.
The monthly income (Quantity * UnitPrice) and repayments (Quantity * UnitPrice). The December - January returns are following the October - December increased pucrhases. "End of the year" craziness. :)
The 2 figures below are specific visualizations of the "How often each product is returned (frequency) and in what amount?" These help to perceive the outliers: the weakest and the best-sold (with lowest return rate) products in our portfolio.

The t-test result of the hypothesis "non-UK costumers purchase goods in significantly higher amount than UK customers". Further details can be found in the above-mentioned Notebook or pdf files.
See the SQL-based study or the Power BI solution on the same topic.
* Compatible Notebook readers/editors: DataCamp online Notebook (requires registration) or Anaconda Jupyter Notebook (requires installation) or the online version of Jupyter
No comments:
Post a Comment