Analysis of a retail company transaction dataset by MS SQL database.
In the description Microsoft SQL Server Management Studio (SSMS) version 2022 was used to create dataset and to analyse data.
- Checking server and database properties
First of all, to avoid discrepancies it is better to adjust the character set (collation) of the server to match to the collation of the data.-- server COLLATION property SELECT SERVERPROPERTY('collation'); GO
The reply can be for example: SQL_Latin1_General_CP1_CI_AS.
The database was named Questions and Answers, shortly QADB.
USE QADB; GO SELECT DATABASEPROPERTYEX(DB_NAME(), 'collation'); SELECT DB_NAME(); GO
The latter SQL query can be used only when the database has been created. The best if the database collation matches with the loaded data and even better if it matches with the database server collation, leading to fewer errors in special character recognition and consequently during the data analysis/display.
Actually, to create a new database first change/define the context of the current database to the master database. The master database is a system database that contains crucial information about the SQL Server instance, such as system configuration settings, login accounts, and the metadata about all other databases within the instance. For safety reasons of a future complete restart of SQL DB and Query building processes a drop function is inserted which is followed by the real database creation:
USE master; DROP DATABASE IF EXISTS QADB; GO -- creating new database (QADB) CREATE DATABASE QADB; GO
In the case the entire database collation has to be changed use:ALTER DATABASE QADB COLLATE SQL_Latin1_General_CP1_CI_AS;
In another case, when a field of the database collation has to be changed, for example textual entries with special characters for proper display, use:ALTER TABLE QADB ALTER COLUMN city nvarchar(100) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL; COLLATE SQL_Latin1_General_CP1_CI_AS;
Create user (at master db level) with specified access to the (QA) database:CREATE LOGIN NewUser WITH PASSWORD = 'SecurePassword';
This can be set later as well, but for simple analysis it is not required as by creating the database you immediately get access to it as default user.
Start using the created database by switching from master DB to QADB:USE QADB; GO
CREATE SCHEMA KDB AUTHORIZATION dbo; GO
- Creating tables and fields with constraints
Create the Tables of the database with realistic and logical limitations (constraints 1, 2), such as product_price cannot be negative, product_name should not be longer than 50 characters, but can be shorter, which can be defined as variable character length: nvarcharDROP TABLE IF EXISTS KDB.Product; -- safety drop on restart CREATE TABLE KDB.Product ( productID IDENTITY(1000000000000,1) NOT NULL CONSTRAINT PRIMARY KEY, product_name nvarchar(50) NOT NULL, product_price int DEFAULT 0 NOT NULL CHECK (product_price >= 1) );
Constraints CHECK is a limitation that does not allow (from creation with the defined limitation) to enter such values which do not match with the declared condition.
Create another Table to store Warehouse related information such as the quantity of available products. Being a new table it requires an identity (primary key*) field as well: stock ID:CREATE TABLE KDB.ProductRegister ( product_stockID char(13) NOT NULL, -- IDENTITY(1,1) product_quantity INT NOT NULL CHECK (product_quantity >= 1) );
The above defined product_stockID in ProductRegister Table is related to the productID in 1:1 relationship, so these feilds has to be connected in a way that the product_stockID besides being an ID in its own table, requires a foreign key: the productID:ALTER TABLE KDB.ProductRegister ADD CONSTRAINT PK_ProductRegister_product_stockID PRIMARY KEY (product_stockID), CONSTRAINT FK_ProductRegister_productID FOREIGN KEY (product_stockID) REFERENCES KDB.Product(productID); GO
Primary keys and Foreign keys may be named as you like but it is optimal if you define a descriptive name, including its nature "PK_" or "FK_", the name of the field in the current table and in the case of foreign key add the name of the referred primary key in the other table, as well.
A business always have clients and it is important to store the related information in the database:CREATE TABLE KDB.Customers ( customerID INT NOT NULL IDENTITY(1,1), customer_name nvarchar(50) NOT NULL, phonenumber varchar(15), address varchar(100) -- this could be split into parts if required ); ALTER TABLE KDB.Customers ADD CONSTRAINT PK_Customers_customerID PRIMARY KEY (customerID); GO
If Warehouse, Product, and Customer information have their fields defined then only the purchases remain to be registered. There are simple and complex ways to create the required structure, here I made a medium-level complexity solution. We store the purchase (invoice/bill/transaction) main data in one table (SalesHead) and the exact items (codes) and related information are stored in a separate, but connected table SalesDROP TABLE IF EXISTS KDB.SalesHead; -- for safety restart reasons CREATE TABLE KDB.SalesHead ( customerID INT NOT NULL CONSTRAINT FK_Customers_SalesHead_customerID FOREIGN KEY (customerID) REFERENCES KDB.Customers (customerID), transactionID INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_SalesHead_transactionID PRIMARY KEY (transactionID), totalamount INT NOT NULL, CHECK (totalamount >= 1) t_date SMALLDATETIME NOT NULL ); ALTER TABLE KDB.SalesHead ADD CONSTRAINT UQ_SalesHead_salesCustomerID UNIQUE (customerID, transactionID) totalamount; GO
customerID, and transactionID as a combination create such a value pair which is able (as a value pair) to identify the transaction, but those were not chosen as a composite (combined) primary key* because for simple reasons it is better to create an ID for all events separately: transactionID.
The unique nature of the identity (typically a large number or a long alphanumerical value) is provided by the server, as on every new entry the identity value 'increases'.DROP TABLE IF EXISTS KDB.Sales; CREATE TABLE KDB.sales ( salesID INT NOT NULL UNIQUE, CONSTRAINT FK_SalesHeadtransactionID_SalesID FOREIGN KEY (salesID) REFERENCES KDB.SalesHead(transactionID), itemnumber INT NOT NULL IDENTITY(1,1) CONSTRAINT PK_SalesItemNumID PRIMARY KEY (itemnumber), sold_product INT NOT NULL, sold_quantity INT NOT NULL, CHECK (sold_quantity >= 1) );
* Primary key: One or more fields (composite primary key) whose values uniquely identify each
record in a table. A primary key cannot allow Null values and must always have
a unique index. A primary key is used to relate a table to foreign keys in other
tables.
Next steps...