OnlineRetail dataset, Analysis#3 - Power BI

Analysis 3rd attempt using MS Power BI on the DataCamp - OnlineRetail dataset


Additional Power BI #2 - Clustering with Power BI


In the first two runs, I analyzed the dataset with Python and SQL with the help of the in-built visualization of the Notebook offered on DataCamp website and also using matplotlib for special purposes.

Dataset reference: The original dataset (xlsx file) was downloaded from the Dataset owner's site.

The complete dataset and the BI solution is zipped. The package sizes 26 MB and requires installed Microsoft Power BI software (ver 2.130+, made wtih ver 2.131.1203.0).

Microsoft (MS) Power BI software is able to load data easily from Excel (or from multiple files or online databases) and recognize data types automatically so I only had to concentrate on visualization of the required information. I created the simplest figures with minimal style adjustments, as making more elaborate or eye-catching visuals would have taken nearly twice as long to complete the project.

Before the questions and answers, here you find some general information on the data and the dataset:

Dataset general info

First of all, in general the most important data for Sales&Management is what are the flagship products? Of course they already know, but this is a proof of properly working queries on a well designed database/dataset. See in table, and in graph format:

Online retail - best sellers

It is important to know the dataset size, the amount of countries and Users (website user Customers), see the numbers on the right:

or evidently being UK centered it is also benefitial to know the most returned goods in UK or even better to know the users (identifiers) who have returned the most amount of goods, to extract their list and to target them either by some discount or by feedback questionnaire to understand the reason of returns... with an aim to care with them and regain their trust and loyalty toward our company and our products.

In the order of the questions, let's go through the answers.

1) The most returned items



The number of return related entries in the dataset is around 10 620, sent back by 1590 Customers from 30 countries.
Returns - numbers

The details of the Returns - general dashboard

Most returned goods

Returns - by countries
Not surprisingly the returned goods in highest amounts are related to UK customers, it is a direct consequence of that the most purchases are originating from there, as well (data not shown).


At the bottom Treemap, UK related transactions are excluded to demonstrate the contribution of different countries in the amount of returned goods (the tile graph may be optimized in size).

2) Profits earned in the UK, on different timescales

A simple demand, easy to answer using the in-built [Month] or [MonthNo], [Quarter] part of DateTime variable. I used 'Online Retail'[InvoiceDate][MonthNo] to define a new column in the data Table. 

In case of weeks the week number can be determinde by the formula
WEEKNUM('Online Retail'[InvoiceDate])
in a new column.

The profit is considered as the sum of the purchases and returns related costs/refunds, so a 'simple' sum of Quantity*Unit price for all transactions in the given time period. Remember that returned quantities are defined with negative integers.

Profits at different timescales

3) Is non-UK purchased amount significantly larger than UK quantities?

First thing to mention; Power BI does not have an in-built (statistical) t-test functionality, so at the moment I provide only such simple graphs and related data that is the base of a t-test. Expert data analyser 


Note that the larger mean value does not prove immediately that non-UK customers buy goods in significantly larger amount. However the plots would indicate so for a professional Data Scientist, the absolut proof can be made by a t-test (not shown here).

In addition to the above-mentioned dataset-related supplementary information, a lot of other facts and conclusion can be extracted from the dataset, but this post is about answering the DataCamp defined 3 questions using Power BI.

No comments:

Post a Comment

Snowflake universe, part #6 - Forecasting2

Forecasting with built-in ML module Further posts in  Snowflake  topic SnowFlake universe, part#1 SnowFlake, part#2 SnowPark Notebook Snow...