Analysis 3rd attempt using MS Power BI on the DataCamp - OnlineRetail dataset

Additional Power BI #1 - Frequency of purchased quantities in countries

Additional Power BI #2 - Clustering with Power BI

In the first two runs, I analyzed the dataset with Python and SQL with the help of the in-built visualization of the Notebook offered on DataCamp website and also using matplotlib for special purposes.

Dataset reference: The original dataset (xlsx file) was downloaded from the Dataset owner's site.

The complete dataset and the BI solution is zipped. The package sizes 26 MB and requires installed Microsoft Power BI software (ver 2.130+, made wtih ver 2.131.1203.0).

Microsoft (MS) Power BI software is able to load data easily from Excel (or from multiple files or online databases) and recognize data types automatically so I only had to concentrate on visualization of the required information. I created the simplest figures with minimal style adjustments, as making more elaborate or eye-catching visuals would have taken nearly twice as long to complete the project.

Before the questions and answers, here you find some general information on the data and the dataset:

First of all, in general the most important data for Sales&Management is what are the flagship products? Of course they already know, but this is a proof of properly working queries on a well designed database/dataset. See in table, and in graph format:

It is important to know the dataset size, the amount of countries and Users (website user Customers), see the numbers on the right:

or evidently being UK centered it is also benefitial to know the most returned goods in UK or even better to know the users (identifiers) who have returned the most amount of goods, to extract their list and to target them either by some discount or by feedback questionnaire to understand the reason of returns... with an aim to care with them and regain their trust and loyalty toward our company and our products.

In the order of the questions, let's go through the answers.

1) The most returned items

The number of return related entries in the dataset is around 10 620, sent back by 1590 Customers from 30 countries.

The details of the Returns - general dashboard

Not surprisingly the returned goods in highest amounts are related to UK customers, it is a direct consequence of that the most purchases are originating from there, as well (data not shown).

At the bottom Treemap, UK related transactions are excluded to demonstrate the contribution of different countries in the amount of returned goods (the tile graph may be optimized in size).

2) Profits earned in the UK, on different timescales

A simple demand, easy to answer using the in-built [Month] or [MonthNo], [Quarter] part of DateTime variable. I used 'Online Retail'[InvoiceDate][MonthNo] to define a new column in the data Table.

In case of weeks the week number can be determinde by the formula

WEEKNUM('Online Retail'[InvoiceDate])

in a new column.

The profit is considered as the sum of the purchases and returns related costs/refunds, so a 'simple' sum of Quantity*Unit price for all transactions in the given time period. Remember that returned quantities are defined with negative integers.

3) Is non-UK purchased amount significantly larger than UK quantities?

First thing to mention; Power BI does not have an in-built (statistical) t-test functionality, so at the moment I provide only such simple graphs and related data that is the base of a t-test. Expert data analyser

Note that the larger mean value does not prove immediately that non-UK customers buy goods in significantly larger amount. However the plots would indicate so for a professional Data Scientist, the absolut proof can be made by a t-test (not shown here).

In addition to the above-mentioned dataset-related supplementary information, a lot of other facts and conclusion can be extracted from the dataset, but this post is about answering the DataCamp defined 3 questions using Power BI.

Data4every1

Pages

OnlineRetail dataset, Analysis#3 - Power BI

Analysis 3rd attempt using MS Power BI on the DataCamp - OnlineRetail dataset

1) The most returned items

2) Profits earned in the UK, on different timescales

3) Is non-UK purchased amount significantly larger than UK quantities?

No comments:

Post a Comment

Snowflake universe, part #6 - Forecasting2

CollapsibleJava

Pages

document.write(ssyby);

OnlineRetail dataset, Analysis#3 - Power BI

Analysis 3rd attempt using MS Power BI on the DataCamp - OnlineRetail dataset

1) The most returned items

2) Profits earned in the UK, on different timescales

3) Is non-UK purchased amount significantly larger than UK quantities?

No comments:

Post a Comment

Snowflake universe, part #6 - Forecasting2

CollapsibleJava