Deep into dbt’s manifest.json—this is one of the most important internal artifacts in dbt and is the brain of your project DAG.
What is manifest.json?
manifest.json is a compiled metadata file generated when you run:
</> Bash
dbt run
dbt compile
dbt docs generate
📌 It contains:
- All models, sources, tests, macros
- Dependency graph (DAG)
- Compiled SQL
- Column metadata
- Lineage relationships
👉 Think of it as:
dbt project → parsed → compiled → manifest.json (single source of truth)📦 High-Level Structure
{
"metadata": {},
"nodes": {},
"sources": {},
"macros": {},
"parent_map": {},
"child_map": {},
"docs": {},
"exposures": {},
"metrics": {}
}🔍 1.
metadataSection🔹 What it contains
"metadata": {
"dbt_version": "1.x.x",
"project_name": "retail_dbt",
"generated_at": "timestamp",
"adapter_type": "snowflake"
}🔹 Why it matters
- Tracks dbt version compatibility
- Helps debugging pipeline issues
🧩 2.
nodes(CORE of dbt)This is the most important section.
🔹 What are nodes?
Everything dbt builds:
- models
- tests
- seeds
- snapshots
🔹 Example: Model Node
"model.retail_dbt.stg_customers": { "resource_type": "model", "name": "stg_customers", "database": "RETAIL_DB", "schema": "STAGING", "alias": "stg_customers", "raw_code": "SELECT * FROM {{ source('raw','customers_raw') }}", "compiled_code": "SELECT * FROM RETAIL_DB.RAW.CUSTOMERS_RAW", "depends_on": { "nodes": ["source.retail_dbt.raw.customers_raw"] }, "config": { "materialized": "view" } }🔹 Key Fields Explained
✅
raw_code
- Your original SQL with Jinja
✅
compiled_code
- Final SQL sent to Snowflake
✅
depends_on
- Defines DAG edges
✅
config
- Materialization (view/table/incremental)
🌐 3.
sources🔹 Example
"source.retail_dbt.raw.customers_raw": { "database": "RETAIL_DB", "schema": "RAW", "identifier": "CUSTOMERS_RAW" }🔹 Purpose
- Maps dbt → physical tables
- Enables lineage tracking
🧬 4.
parent_map(UPSTREAM)🔹 Example
"model.retail_dbt.stg_customers": [ "source.retail_dbt.raw.customers_raw" ]🔹 Meaning
- Who feeds into this model
🔗 5.
child_map(DOWNSTREAM)🔹 Example
"model.retail_dbt.stg_customers": [ "model.retail_dbt.dim_customers" ]🔹 Meaning
🧠 DAG Insight
- Who depends on this model
Together:
parent_map + child_map = full DAG graphThis powers:
- dbt lineage UI
- model execution order
"macro.dbt_utils.generate_surrogate_key": { "name": "generate_surrogate_key", "macro_sql": "md5(concat(...))" }🧰 6.
macros🔹 Example
🔹 Purpose
- Stores reusable logic
- Used during compilation
🧪 7.
testsStored inside
nodes🔹 Example
"test.retail_dbt.unique_customer_id": { "resource_type": "test", "depends_on": { "nodes": ["model.retail_dbt.stg_customers"] } }🔹 Purpose
- Defines validation logic
- Linked to models
📚 8.
docs&exposures🔹 Docs
- Column descriptions
- Model descriptions
🔹 Exposures
"exposure.dashboard.sales_dashboard": { "type": "dashboard", "depends_on": { "nodes": ["model.retail_dbt.fact_orders"] } }👉 Connects dbt → BI tools like:
- Tableau
- Power BI
HOW dbt USES manifest.json INTERNALLY
Step-by-step:
1. Parse Phase
- Reads SQL + YAML → builds nodes
2. Compile Phase
- Resolves:
ref()source()- macros
3. DAG Build
- Uses:
depends_onparent_map4. Execution Planning
- Orders models correctly
5. Run Phase
- Executes
compiled_code6. Docs Generation
- Uses manifest for lineage graph
🎯 Key Takeaways
manifest.json= central brain of dbt- Stores:
- models
- dependencies
- compiled SQL
- Powers:
- execution
- lineage
- CI/CD
- Essential for:
- debugging
- optimization
- orchestration
No comments:
Post a Comment