Airbnb

CRISP-DM–driven analytics using Snowflake & dbt.

1. Overview

This project demonstrates how I approach data consulting engagements using a structured, iterative framework based on CRISP-DM, implemented through dbt and Snowflake.

The case study simulates working with a client exploring entry into the Airbnb rental market. Rather than focusing solely on outputs, the emphasis is on process: translating business strategy into measurable, data-driven outcomes through clear frameworks, transparent ETL, and repeatable analytical loops.

Key Outcomes:

  • End-to-end application of a business-led analytical framework
  • Scalable, reproducible ETL pipeline
  • Analytics-ready data models to support reporting and decision-making
  • Iterative refinement as business questions evolve

2. Process

The project follows a cyclical analytics framework, beginning with business objectives and looping through progressively deeper levels of insight.

Each iteration expands capability: as the business matures, questions become more granular, new datasets are introduced, and metrics evolve to support more sophisticated decisions.

flowchart TB A["Define Business Objective
Set measurable business goals"] B["Acquire & Explore Data
Collect and understand data sources"] C["Enrich & Transform
Clean, join and prepare data"] D["Model & Evaluate
Test hypotheses and measure performance"] E["Deliver Insights
Visualise and communicate findings"] F["Act & Monitor
Implement changes and track results"] A --> B --> C --> D --> E --> F -.-> A

Tooling Overview: dbt & Snowflake

This project uses Snowflake as the cloud data warehouse and dbt for data transformation and modelling. Together, they enable:

  • A centralised, cloud-based single source of truth
  • Clear separation between raw, cleansed, and curated data layers
  • Reproducible, version-controlled transformations
  • Transparent business logic embedded directly in SQL models
  • Safe iteration as business questions evolve
The result is a structured analytics workflow that supports repeatable analysis, reliable reporting, and incremental decision-making.

3. Framework Loop 1 - Core Business Case and Project Setup

3.1 Core Business Case

Objective: Enable a prospective Airbnb investor to quickly assess revenue potential for a property purchase using market-level data and simple assumptions. This loop focuses on speed, clarity, and risk reduction, not precision.

3.2 ETL Creation

With the business objective defined, the first step was to design a reliable ETL (Extract, Transform, Load) pipeline to process the Airbnb dataset efficiently and reproducibly.

This phase covers the Acquire & Explore Data and Enrich & Transform stages of the wider framework.

The pipeline was built around three data layers (Raw, Cleansed, and Curated), each serving a distinct purpose, from storing unmodified source files to producing analysis-ready tables.

Please navigate here for more information on the methodology used, and a walk through on the below diagram.

---
config:
  layout: dagre
---
flowchart TB
 subgraph L0["L0 — Raw Layer"]
        A1["Raw Source Files
Eg CSV/Parquet"] end subgraph L4["Data Profiling"] B1["Profile Data"] end subgraph L1["L1 - Cleansed Layer"] B2["Standardise Data"] B3["Cleanse Data"] end subgraph L2["L2 — Curated Layer"] C1["Aggregate Data"] C2["Apply Business Logic"] C3["Analytics / Reporting / Modelling"] end subgraph ETL["ETL Pipeline"] direction TB L0 L4 L1 L2 end A1 --> B1 B1 --> B2 B2 --> B3 B3 --> C1 C1 --> C2 C2 --> C3 A1 -.-> DB[("Snowflake
Warehouse + dbt")] B3 -.-> DB C3 -.-> DB DB <-.-> LR[["Local Git Repo"]] DB <-.-> KN[["External Tools"]]

At every stage, data quality checks were used to maintain consistency and traceability.

All code, documentation, and task tracking were managed through GitHub, ensuring version control and transparency as the project evolved.

3.3 Insights and Actions

Deliver Insights and Act & Monitor.