Airbnb Analysis
Airbnb
CRISP-DM–driven analytics using Snowflake & dbt.
1. Overview
This project demonstrates how I approach data consulting engagements using a structured, iterative framework based on CRISP-DM, implemented through dbt and Snowflake.
The case study simulates working with a client exploring entry into the Airbnb rental market. Rather than focusing solely on outputs, the emphasis is on process: translating business strategy into measurable, data-driven outcomes through clear frameworks, transparent ETL, and repeatable analytical loops.
Key Outcomes:
- End-to-end application of a business-led analytical framework
- Scalable, reproducible ETL pipeline
- Analytics-ready data models to support reporting and decision-making
- Iterative refinement as business questions evolve
2. Process
The project follows a cyclical analytics framework, beginning with business objectives and looping through progressively deeper levels of insight.
Each iteration expands capability: as the business matures, questions become more granular, new datasets are introduced, and metrics evolve to support more sophisticated decisions.
Set measurable business goals"] B["Acquire & Explore Data
Collect and understand data sources"] C["Enrich & Transform
Clean, join and prepare data"] D["Model & Evaluate
Test hypotheses and measure performance"] E["Deliver Insights
Visualise and communicate findings"] F["Act & Monitor
Implement changes and track results"] A --> B --> C --> D --> E --> F -.-> A
Tooling Overview: dbt & Snowflake
This project uses Snowflake as the cloud data warehouse and dbt for data transformation and modelling. Together, they enable:
- A centralised, cloud-based single source of truth
- Clear separation between raw, cleansed, and curated data layers
- Reproducible, version-controlled transformations
- Transparent business logic embedded directly in SQL models
- Safe iteration as business questions evolve
3. Framework Loop 1 - Core Business Case and Project Setup
3.1 Core Business Case
Objective: Enable a prospective Airbnb investor to quickly assess revenue potential for a property purchase using market-level data and simple assumptions. This loop focuses on speed, clarity, and risk reduction, not precision.
3.2 ETL Creation
With the business objective defined, the first step was to design a reliable ETL (Extract, Transform, Load) pipeline to process the Airbnb dataset efficiently and reproducibly.
This phase covers the Acquire & Explore Data and Enrich & Transform stages of the wider framework.
The pipeline was built around three data layers (Raw, Cleansed, and Curated), each serving a distinct purpose, from storing unmodified source files to producing analysis-ready tables.
Please navigate here for more information on the methodology used, and a walk through on the below diagram.
---
config:
layout: dagre
---
flowchart TB
subgraph L0["L0 — Raw Layer"]
A1["Raw Source Files
Eg CSV/Parquet"]
end
subgraph L4["Data Profiling"]
B1["Profile Data"]
end
subgraph L1["L1 - Cleansed Layer"]
B2["Standardise Data"]
B3["Cleanse Data"]
end
subgraph L2["L2 — Curated Layer"]
C1["Aggregate Data"]
C2["Apply Business Logic"]
C3["Analytics / Reporting / Modelling"]
end
subgraph ETL["ETL Pipeline"]
direction TB
L0
L4
L1
L2
end
A1 --> B1
B1 --> B2
B2 --> B3
B3 --> C1
C1 --> C2
C2 --> C3
A1 -.-> DB[("Snowflake
Warehouse + dbt")]
B3 -.-> DB
C3 -.-> DB
DB <-.-> LR[["Local Git Repo"]]
DB <-.-> KN[["External Tools"]]
At every stage, data quality checks were used to maintain consistency and traceability.
All code, documentation, and task tracking were managed through GitHub, ensuring version control and transparency as the project evolved.
3.3 Insights and Actions
Deliver Insights and Act & Monitor.