dbt (data build tool) is a metadata-driven transformation framework that functions as a DAG-based SQL compiler and execution orchestrator for cloud data warehouses. Internally, it parses project files to construct a dependency graph using ref() and source(), then compiles Jinja-templated models into optimized SQL via its macro engine. Execution is delegated to the warehouse, with parallelization governed by graph topology. Core artifacts like manifest.json encode full lineage, configurations, and compiled nodes, while run_results.json captures execution telemetry. This architecture positions dbt as a control plane that unifies transformation logic, lineage, testing, and observability within modern data platforms.
What dbt Really Is (Architect Perspective)
At its core, dbt is a:
👉 Metadata-driven transformation framework
👉 SQL compiler + DAG execution engine
👉 Control plane over warehouse compute
Inside dbt Internals
- DAG
- Manifest.json
- Execution Engine
- SQL Compiler + DAG Execution Framework
- dbt scans project files
- Builds dependency graph using ref()
- Creates Directed Acyclic Graph
Each node =
- Model
- Test
- Seed
Contains:
- DAG structure
- Model metadata
- Compiled SQL
- Lineage
- Powers dbt docs
- Enables lineage tools
- Integrates with DataHub / OpenLineage
Jinja SQL → Compiled SQL
Includes:
- Macros
- Variables
- Environment configs
❌ Does NOT process data
✅ Pushes SQL to warehouse
No comments:
Post a Comment