Business Insights Project

Introduction

This Business Insights Dashboard was designed with a business focus and simulates a real analytics pipeline. It's end to end: raw data ingestion through transformation to visualisation in Power BI. It's worth noting that this is really two projects in one, and to achieve this it means stepping into a number of different roles: Data engineering, Analytical Engineering, Data Analytics and Business Intelligence.

It touches on technical skills (SQL, data modelling, Power BI development), project delivery skills (planning, documentation, reproducibility), and stakeholder focuses for clarity, usability, and business impact.

I used the publicly available Walmart M5 Forecasting – Accuracy dataset, chosen for its realistic retail structure and multi-hierarchy challenges. I wanted to build a framework that makes it easy to pull meaningful conclusions from sales analysis, forecast performance, and data quality validation, as well as allowing myself to experiment with new libraries and techniques.

Objectives

Alongside the technical build, I aimed to answer a few key questions a hiring manager or stakeholder might ask:

What business problem does this solve, and why does it matter?

Many analytics teams struggle with disconnected datasets, inconsistent logic, and slow analysis cycles. This project simulates how a structured data model can align sales, forecast, and data quality views all within a single process.

What business value does this create?

The result is a unified dataset that supports faster decision-making. For example, identifying underperforming product lines, quantifying forecast bias, or flagging data quality issues before they affect reporting.

Process Overview

The ETL (Extract, Transform, Load) process ingests raw data such as sales and product information, cleans and reshapes it. This creates analytics-ready tables, which are then used in things such as reporting in Power BI, what-if scenario modelling, and data quality testing. The data pipeline follows a layered approach (L0 -> L1 -> L2 -> Reporting) & Data Quality Tests.

Stage	Description	Example Tools / Skills
L0 – Raw Data Code ↗ Docs ↗ DB Profile ↗	Ingest source files exactly as received to preserve data fidelity and enable auditing.	SQL, DuckDB, Data Quality Checks
L1 – Standardized Code ↗ Docs ↗ DB Profile ↗	Clean, reshape, and join data across tables for consistency and relational integrity.	CTEs, Joins, Data Validation
L2 – Analytics-Ready Code 1 ↗ Docs 1 ↗ Code 2 ↗ Docs 2 ↗ DB Profile ↗	Aggregate and calculate metrics such as MAPE and Bias for reporting layers.	SQL Views, KPIs, Aggregations
Reporting Code ↗ Docs ↗ Insights	Deliver insights via interactive visuals designed for business decision-making, & modelling (EG scenario planning, KPIs, forecasting).	Power BI, DAX, Data Storytelling
Data Quality Tests Code ↗ Docs ↗	Each Level is tested and validated against known 'truth' values to ensure data integrity is maintained.	Tests can be counting rows, distinct values, nulls, dtypes, column naming, sums, min, max at a truth source, layer, table, column, truth value and actual value level

Insights

The Power BI dashboard was designed around three key audiences (Executive, Analyst, and Technical) each with distinct needs and perspectives.

Bridging these viewpoints is what every business should strive for. In practice, analysts often find themselves translating between “top-down” executive summaries and “bottom-up” operational reports built on separate datasets. This project demonstrates how both worlds can coexist within a unified, transparent data model.

Built with best practices in mind (E.G. star schema with defined keys and one-to-many relationships), and data stored in Parquet format (since hosting an online SQL Server was outside the project’s scope), the resulting dashboard (~72 million rows) still performs really efficiently, delivering a seamless experience for end users.

Examples of these pages can be found in Insights

Data Quality Checks Overview

This step is to ensure the data can be trusted, and that missed updates, transformation errors etc are proactively idenfied before moving on to the next stage. I created a process mirroring Great Expectations (a DQ testing library), taking 'truths' from trusted inputs, and running a number of tests at each stage: the test type, truth source, layer, table, column, truth value and actual value.

Documentation & Reproducibility

GitHub Code

The GitHub repository keeps all project files version-controlled and traceable from start to finish. It ensures transparency, reproducibility, and business continuity, allowing others to review, replicate, or build upon the work with ease. GitHub Code ↗

Documentation

Documentation is hosted on GitHub and written in markdown. GitHub Wiki ↗

Contents include:

Files and descriptions, IE what their purpose is.
How to run the process.
Auto-generated pages such as database profiles. I found this particulary useful when writing queries.

Project Management

Each key action was tracked as an issue within the project’s GitHub board, providing a structured, transparent way to plan, prioritize, and monitor progress throughout development. GitHub Project ↗

Skills & Tools Overview

SQL – CTEs, JOINs, UNPIVOT, window functions, temp tables, views, aggregates, CASE logic, etc.
Python – pandas, numpy, duckdb, pyarrow, fastparquet, pathlib
Power BI – DAX, M-code, service integration

Jonny Greenwood