Thursday, March 26, 2026

dbt manifest.json internals

 Deep into dbt’s manifest.json—this is one of the most important internal artifacts in dbt and is the brain of your project DAG.

What is manifest.json?

manifest.json is a compiled metadata file generated when you run:

</> Bash

dbt run

dbt compile

dbt docs generate

📌 It contains:

  • All models, sources, tests, macros
  • Dependency graph (DAG)
  • Compiled SQL
  • Column metadata
  • Lineage relationships

👉 Think of it as:

dbt project → parsed → compiled → manifest.json (single source of truth)

📦 High-Level Structure

{
"metadata": {},
"nodes": {},
"sources": {},
"macros": {},
"parent_map": {},
"child_map": {},
"docs": {},
"exposures": {},
"metrics": {}
}

🔍 1. metadata Section

🔹 What it contains

"metadata": {
"dbt_version": "1.x.x",
"project_name": "retail_dbt",
"generated_at": "timestamp",
"adapter_type": "snowflake"
}

🔹 Why it matters

  • Tracks dbt version compatibility
  • Helps debugging pipeline issues

🧩 2. nodes (CORE of dbt)

This is the most important section.

🔹 What are nodes?

Everything dbt builds:

  • models
  • tests
  • seeds
  • snapshots

🔹 Example: Model Node

"model.retail_dbt.stg_customers": { "resource_type": "model", "name": "stg_customers", "database": "RETAIL_DB", "schema": "STAGING", "alias": "stg_customers", "raw_code": "SELECT * FROM {{ source('raw','customers_raw') }}", "compiled_code": "SELECT * FROM RETAIL_DB.RAW.CUSTOMERS_RAW", "depends_on": { "nodes": ["source.retail_dbt.raw.customers_raw"] }, "config": { "materialized": "view" } }

🔹 Key Fields Explained

raw_code

  • Your original SQL with Jinja

compiled_code

  • Final SQL sent to Snowflake

depends_on

  • Defines DAG edges

config

  • Materialization (view/table/incremental)

🌐 3. sources

🔹 Example

"source.retail_dbt.raw.customers_raw": { "database": "RETAIL_DB", "schema": "RAW", "identifier": "CUSTOMERS_RAW" }

🔹 Purpose

  • Maps dbt → physical tables
  • Enables lineage tracking

🧬 4. parent_map (UPSTREAM)

🔹 Example

"model.retail_dbt.stg_customers": [ "source.retail_dbt.raw.customers_raw" ]

🔹 Meaning

  • Who feeds into this model

🔗 5. child_map (DOWNSTREAM)

🔹 Example

"model.retail_dbt.stg_customers": [ "model.retail_dbt.dim_customers" ]

🔹 Meaning

  • Who depends on this model
🧠 DAG Insight

Together:

parent_map + child_map = full DAG graph

This powers:

  • dbt lineage UI
  • model execution order

🧰 6. macros

🔹 Example

"macro.dbt_utils.generate_surrogate_key": { "name": "generate_surrogate_key", "macro_sql": "md5(concat(...))" }

🔹 Purpose

  • Stores reusable logic
  • Used during compilation

🧪 7. tests

Stored inside nodes

🔹 Example

"test.retail_dbt.unique_customer_id": { "resource_type": "test", "depends_on": { "nodes": ["model.retail_dbt.stg_customers"] } }

🔹 Purpose

  • Defines validation logic
  • Linked to models

📚 8. docs & exposures

🔹 Docs

  • Column descriptions
  • Model descriptions

🔹 Exposures

"exposure.dashboard.sales_dashboard": { "type": "dashboard", "depends_on": { "nodes": ["model.retail_dbt.fact_orders"] } }

👉 Connects dbt → BI tools like:

  • Tableau
  • Power BI

HOW dbt USES manifest.json INTERNALLY

Step-by-step:

1. Parse Phase

  • Reads SQL + YAML → builds nodes

2. Compile Phase

  • Resolves:
    • ref()
    • source()
    • macros

3. DAG Build

  • Uses:
    • depends_on
    • parent_map

4. Execution Planning

  • Orders models correctly

5. Run Phase

  • Executes compiled_code

6. Docs Generation

  • Uses manifest for lineage graph

🎯 Key Takeaways

  • manifest.json = central brain of dbt
  • Stores:
    • models
    • dependencies
    • compiled SQL
  • Powers:
    • execution
    • lineage
    • CI/CD
  • Essential for:
    • debugging
    • optimization
    • orchestration

No comments: