Sunday, March 29, 2026

dbt MCP Server




The dbt MCP server architecture allows for your agent to connect to a variety of tools.

SQL : Tools for executing and generating SQL on dbt Platform infrastructure.

  • execute_sql: Executes SQL on dbt Platform infrastructure with Semantic Layer support.
  • text_to_sql: Generates SQL from natural language using project context.

Semantic Layer: dbt Semantic Layer

  • get_dimensions: Gets dimensions for specified metrics.
  • get_entities: Gets entities for specified metrics.
  • get_metrics_compiled_sql: Returns compiled SQL for metrics without executing the query.
  • list_metrics: Retrieves all defined metrics.
  • list_saved_queries: Retrieves all saved queries.
  • query_metrics: Executes metric queries with filtering and grouping options.

Discovery: dbt Discovery API

  • get_all_macros: Retrieves macros; option to filter by package or return package names only.
  • get_all_models: Retrieves name and description of all models.
  • get_all_sources: Gets all sources with freshness status; option to filter by source name.
  • get_exposure_details: Gets exposure details including owner, parents, and freshness status.
  • get_exposures: Gets all exposures (downstream dashboards, apps, or analyses).
  • get_lineage: Gets full lineage graph (ancestors and descendants) with type and depth filtering.
  • get_macro_details: Gets details for a specific macro.
  • get_mart_models: Retrieves all mart models.
  • get_model_children: Gets downstream dependents of a model.
  • get_model_details: Gets model details including compiled SQL, columns, and schema.
  • get_model_health: Gets health signals: run status, test results, and upstream source freshness.
  • get_model_parents: Gets upstream dependencies of a model.
  • get_model_performance: Gets execution history for a model; option to include test results.
  • get_related_models: Finds similar models using semantic search.
  • get_seed_details: Gets details for a specific seed.
  • get_semantic_model_details: Gets details for a specific semantic model.
  • get_snapshot_details: Gets details for a specific snapshot.
  • get_source_details: Gets source details including columns and freshness.
  • get_test_details: Gets details for a specific test.
  • search: [Alpha] Searches for resources across the dbt project (not generally available).

dbt CLI: Allowing your client to utilize dbt commands through the MCP tooling could modify your data models, sources, and warehouse objects. Proceed only if you trust the client and understand the potential impact.

  • build: Executes models, tests, snapshots, and seeds in DAG order.
  • compile: Generates executable SQL from models/tests/analyses; useful for validating Jinja logic.
  • docs: Generates documentation for the dbt project.
  • get_lineage_dev: Retrieves lineage from local manifest.json with type and depth filtering.
  • get_node_details_dev: Retrieves node details from local manifest.json (models, seeds, snapshots, sources).
  • list: Lists resources in the dbt project by type with selector support.
  • parse: Parses and validates project files for syntax correctness.
  • run: Executes models to materialize them in the database.
  • show: Executes SQL against the database and returns results.
  • test: Runs tests to validate data and model integrity.

Admin API: dbt Administrative API

  • cancel_job_run: Cancels a running job.
  • get_job_details: Gets job configuration including triggers, schedule, and dbt commands.
  • get_job_run_artifact: Downloads a specific artifact file from a job run.
  • get_job_run_details: Gets run details including status, timing, steps, and artifacts.
  • get_job_run_error: Gets error and/or warning details for a job run; option to include or show warnings only.
  • get_project_details: Gets project information for a specific dbt project.
  • list_job_run_artifacts: Lists available artifacts from a job run.
  • list_jobs: Lists jobs in a dbt Platform account; option to filter by project or environment.
  • list_jobs_runs: Lists job runs; option to filter by job, status, or order by field.
  • list_projects: Lists all projects in the dbt Platform account.
  • retry_job_run: Retries a failed job run.
  • trigger_job_run: Triggers a job run; option to override git branch, schema, or other settings.

dbt Codegen: These tools help automate boilerplate code generation for dbt project files.

  • generate_model_yaml: Generates model YAML with columns; option to inherit upstream descriptions.
  • generate_source: Generates source YAML by introspecting database schemas; option to include columns.
  • generate_staging_model: Generates staging model SQL from a source table.

dbt LSP : A set of tools that leverage the Fusion engine for advanced SQL compilation and column-level lineage analysis.

  • fusion.compile_sql: Compiles SQL in project context via dbt Platform.
  • fusion.get_column_lineage: Traces column-level lineage via dbt Platform.
  • get_column_lineage: Traces column-level lineage locally (requires dbt-lsp via dbt Labs VSCE).

Product Docs: Tools for searching and fetching content from the official dbt documentation at docs.getdbt.com.

    • get_product_doc_pages: Fetches the full Markdown content of one or more docs.getdbt.com pages by path or URL.
    • search_product_docs: Searches docs.getdbt.com for pages matching a query; returns titles, URLs, and descriptions ranked by relevance. Use get_product_doc_pages to fetch full content.

    MCP Server Metadata: These tools provide information about the MCP server.

    • get_mcp_server_branch: Returns the current git branch of the running dbt MCP server.
    • get_mcp_server_version: Returns the current version of the dbt MCP server.

    Friday, March 27, 2026

    dbt with Databricks Data Pipeline

     

    1. The Databricks Lakehouse Platform (The Engine)

    This is the "Data Plane" and "Compute Plane" at the bottom left.

    • Data Plane: Raw data lands as Delta Tables. The key internal here is the ACID Transaction Log (_delta_log), which ensures that even if a dbt run fails halfway through, your data stays consistent.

    • Intelligence Plane (Unity Catalog): This is the "brain" that manages the Metastore, security, and audit logs. It allows dbt to know exactly where the tables live and who has permission to change them.

    • Compute Plane: When you execute a dbt command, it triggers Photon Engines or SQL Warehouses. Databricks does the actual "crunching" of the numbers; dbt just sends the instructions.

    2. The dbt Core/Cloud Project (The Logic)

    This is the "Logical Plane" at the top, representing your code repository (Git).

    • profiles.yml: The "handshake" file that tells dbt how to log into Databricks using a Personal Access Token (PAT).

    • Models (Staging & Marts): Your SQL files. Staging models handle the "Silver" cleaning (renaming, casting), while Marts models create the "Gold" business-ready data.

    • Snapshots & Macros: Internals that handle Slowly Changing Dimensions (SCD Type 2) to track history and reusable code snippets to keep the project "DRY" (Don't Repeat Yourself).

    • dbt Docs: Automatically generated documentation that shows the Lineage (the map of how data flows from A to Z).

    3. The Transformation Lifecycle (The Refinery)

    The center of the image shows the flow from raw to refined:

    • Bronze (Raw): Untouched data.

    • Silver (Filtered): dbt applies filters, deduplication, and standardized naming conventions.

    • Gold (Standardized): The final, high-quality data. Here, dbt Tests (unique, not_null) act as "quality gates" to ensure no bad data reaches the end users.

    4. Downstream Sharing (The Value)

    The right side shows how different teams "check out" the finished product from the Gold layer:

    • BI Teams: Use SQL Warehouses to power dashboards.

    • Data Science: Accesses the same tables for Feature Stores (ML inputs).

    • AI/LLM Teams: Pulls cleaned text into Vector Search indexes for RAG (Retrieval-Augmented Generation).

    • External Partners: Use Delta Sharing to access the data securely without needing to copy it to another system.

    ==============================================================
    Setting up dbt (data build tool) with Databricks is a powerful way to implement a "Medallion Architecture" (Bronze, Silver, Gold). Since Databricks handles the compute and dbt handles the transformation logic via SQL, you get a highly scalable, version-controlled data pipeline.

    1. Establishing the Connection

    Before writing SQL, you need to bridge the two platforms.

    • Databricks Side: Go to SQL Warehouses, select your warehouse, and navigate to Connection Details. You will need the Server Hostname, HTTP Path, and a Personal Access Token (PAT).

    • dbt Side: If using dbt Cloud, use the Databricks adapter. If using dbt Core, install dbt-databricks.

    • Profile Configuration: Your profiles.yml (or Cloud setup) should look like this:

      • Type: databricks

      • Method: token

      • Catalog: main (Unity Catalog is highly recommended for sharing).

    2. Data Ingestion (The "Bronze" Layer)

    dbt is primarily a transformation tool, not an ingestion tool. However, you have two main paths:

    • External Ingestion: Use tools like Fivetran, Airbyte, or Azure Data Factory to land raw data into Delta tables.

    • dbt-as-Ingestion: Use dbt Seeds for small, static CSVs (like mapping tables) or define Sources that point to tables created by Databricks Auto Loader.

    3. Data Cleansing (The "Silver" Layer)

    This is where dbt shines. In this layer, you transform raw data into "clean" entities.

    • Filtering & Deduplication: Use SELECT DISTINCT or window functions to remove duplicates.

    • Casting: Ensure dates are DATE types and strings are trimmed.

    • Renaming: Use dbt models to change cryptic source column names (e.g., src_usr_id) to readable ones (user_id).

    • Implementation: Create a model in models/silver/ and use the ref() function to pull from your Bronze sources.

    4. Maintaining Data Standardization

    To keep your data "Golden," you must enforce quality and consistency across the organization.

    • dbt Tests: Define unique, not_null, and accepted_values in your schema.yml.

    • Macros: Use dbt macros for reusable logic. For example, if you have a specific way to calculate "Fiscal Year," write a macro once and call it in every model to ensure everyone uses the same math.

    • Packages: Utilize dbt_utils for complex cross-database testing and transformations.

    • Documentation: Run dbt docs generate to create a searchable data catalog so users know exactly what a column represents.

    5. Data Sharing to Downstream Teams

    Once your "Gold" layer is ready, you need to deliver it. This is where Unity Catalog on Databricks becomes your best friend.

    For Analytics & BI

    • Materialization: Set your Gold models to materialized='table' or incremental.

    • BI Tools: Tools like Tableau, Power BI, or Databricks SQL Dashboards connect directly to these Gold tables.

    For Data Science & ML

    • Feature Store: DS teams can use your dbt-cleansed tables as the foundation for the Databricks Feature Store.

    • Python Integration: ML engineers can pull dbt models directly into notebooks using: spark.table("main.gold.customer_lifetime_value")

    For AI & LLMs (RAG)

    • Vector Search: You can sync your dbt-processed text data into Databricks Vector Search indexes to power AI chatbots with clean, "truthful" organizational data.

    External Sharing

    • Delta Sharing: Use Databricks Delta Sharing to provide read-only access to your dbt models for partners or vendors outside your Databricks environment without moving the data.



    Thursday, March 26, 2026

    STEP-BY-STEP: How dbt Builds the DAG

     

    1. Parse files
    2. Identify refs/sources
    3. Build nodes
    4. Create dependency graph
    5. Validate DAG
    6. Topological sort
    7. Compile SQL
    8. Execute DAG
    9. Render UI graph


    dbt manifest.json internals

     Deep into dbt’s manifest.json—this is one of the most important internal artifacts in dbt and is the brain of your project DAG.

    What is manifest.json?

    manifest.json is a compiled metadata file generated when you run:

    </> Bash

    dbt run

    dbt compile

    dbt docs generate

    📌 It contains:

    • All models, sources, tests, macros
    • Dependency graph (DAG)
    • Compiled SQL
    • Column metadata
    • Lineage relationships

    👉 Think of it as:

    dbt project → parsed → compiled → manifest.json (single source of truth)

    📦 High-Level Structure

    {
    "metadata": {},
    "nodes": {},
    "sources": {},
    "macros": {},
    "parent_map": {},
    "child_map": {},
    "docs": {},
    "exposures": {},
    "metrics": {}
    }

    🔍 1. metadata Section

    🔹 What it contains

    "metadata": {
    "dbt_version": "1.x.x",
    "project_name": "retail_dbt",
    "generated_at": "timestamp",
    "adapter_type": "snowflake"
    }

    🔹 Why it matters

    • Tracks dbt version compatibility
    • Helps debugging pipeline issues

    🧩 2. nodes (CORE of dbt)

    This is the most important section.

    🔹 What are nodes?

    Everything dbt builds:

    • models
    • tests
    • seeds
    • snapshots

    🔹 Example: Model Node

    "model.retail_dbt.stg_customers": { "resource_type": "model", "name": "stg_customers", "database": "RETAIL_DB", "schema": "STAGING", "alias": "stg_customers", "raw_code": "SELECT * FROM {{ source('raw','customers_raw') }}", "compiled_code": "SELECT * FROM RETAIL_DB.RAW.CUSTOMERS_RAW", "depends_on": { "nodes": ["source.retail_dbt.raw.customers_raw"] }, "config": { "materialized": "view" } }

    🔹 Key Fields Explained

    raw_code

    • Your original SQL with Jinja

    compiled_code

    • Final SQL sent to Snowflake

    depends_on

    • Defines DAG edges

    config

    • Materialization (view/table/incremental)

    🌐 3. sources

    🔹 Example

    "source.retail_dbt.raw.customers_raw": { "database": "RETAIL_DB", "schema": "RAW", "identifier": "CUSTOMERS_RAW" }

    🔹 Purpose

    • Maps dbt → physical tables
    • Enables lineage tracking

    🧬 4. parent_map (UPSTREAM)

    🔹 Example

    "model.retail_dbt.stg_customers": [ "source.retail_dbt.raw.customers_raw" ]

    🔹 Meaning

    • Who feeds into this model

    🔗 5. child_map (DOWNSTREAM)

    🔹 Example

    "model.retail_dbt.stg_customers": [ "model.retail_dbt.dim_customers" ]

    🔹 Meaning

    • Who depends on this model
    🧠 DAG Insight

    Together:

    parent_map + child_map = full DAG graph

    This powers:

    • dbt lineage UI
    • model execution order

    🧰 6. macros

    🔹 Example

    "macro.dbt_utils.generate_surrogate_key": { "name": "generate_surrogate_key", "macro_sql": "md5(concat(...))" }

    🔹 Purpose

    • Stores reusable logic
    • Used during compilation

    🧪 7. tests

    Stored inside nodes

    🔹 Example

    "test.retail_dbt.unique_customer_id": { "resource_type": "test", "depends_on": { "nodes": ["model.retail_dbt.stg_customers"] } }

    🔹 Purpose

    • Defines validation logic
    • Linked to models

    📚 8. docs & exposures

    🔹 Docs

    • Column descriptions
    • Model descriptions

    🔹 Exposures

    "exposure.dashboard.sales_dashboard": { "type": "dashboard", "depends_on": { "nodes": ["model.retail_dbt.fact_orders"] } }

    👉 Connects dbt → BI tools like:

    • Tableau
    • Power BI

    HOW dbt USES manifest.json INTERNALLY

    Step-by-step:

    1. Parse Phase

    • Reads SQL + YAML → builds nodes

    2. Compile Phase

    • Resolves:
      • ref()
      • source()
      • macros

    3. DAG Build

    • Uses:
      • depends_on
      • parent_map

    4. Execution Planning

    • Orders models correctly

    5. Run Phase

    • Executes compiled_code

    6. Docs Generation

    • Uses manifest for lineage graph

    🎯 Key Takeaways

    • manifest.json = central brain of dbt
    • Stores:
      • models
      • dependencies
      • compiled SQL
    • Powers:
      • execution
      • lineage
      • CI/CD
    • Essential for:
      • debugging
      • optimization
      • orchestration

    production-level walkthrough

     Below is a deep, production-level walkthrough of each step in your dbt pipeline—from raw → staging → incremental → SCD → marts → testing → deployment—including:

    • dbt internal components involved
    • SQL / Python role
    • Why the step exists
    • How Git supports each step

    STEP 1: RAW DATA EXTRACT (Sources)

    🔹 What happens

    • Data lands in Snowflake raw schema via:
      • COPY INTO / Snowpipe
    • dbt does NOT ingest, but registers sources

    🔹 dbt Internal Components

    • Source Config (sources.yml)
    • Parser
    • Adapter (Snowflake adapter)

    🔹 SQL / Python Role

    SELECT * FROM {{ source('raw','customers_raw') }}

    • SQL references raw tables
    • Python (optional): external ingestion pipelines

    🔹 Why required

    • Establish trusted entry point
    • Enables lineage tracking
    • Prevents direct table coupling

    🔹 Git Role

    • Version control for:
      • sources.yml
    • Tracks schema evolution
    • Enables rollback if source changes break pipeline.

    STEP 2: STAGING LAYER

    🔹 What happens

    • Clean, standardize, rename columns
    • Apply light transformations

    🔹 dbt Internal Components

    • Model Parser
    • Jinja Engine (macros)
    • Compiler
    • Execution Engine

    🔹 SQL Example

    SELECT
      customer_id,
      LOWER(email) AS email,
      TRIM(name) AS name
    FROM {{ source('raw','customers_raw') }}

    🔹 Why required

    • Removes inconsistencies
    • Creates reusable canonical layer

    🔹 Git Role

    • Tracks column renaming decisions
    • Enables code review for transformations
    • Collaboration across teams

    STEP 3: INCREMENTAL MODELS

    🔹 What happens

    • Load only new or changed data
    • Avoid full table rebuilds

    🔹 dbt Internal Components

    • Materialization Engine (incremental)
    • Macro Engine (is_incremental())
    • Adapter-specific SQL generation

    🔹 SQL Example

    SELECT *

    FROM {{ ref('stg_customers') }}


    {% if is_incremental() %}

    WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }})

    {% endif %}

    🔹 Why required

    • Improves performance
    • Reduces compute cost in Snowflake

    🔹 Git Role

    • Tracks logic changes for incremental filters
    • Prevents accidental full refresh bugs
    • Enables safe experimentation via branches

    STEP 4: SCD TYPE 2 (HISTORICAL TRACKING)

    🔹 What happens

    • Tracks historical changes in dimensions
    • Maintains:
      • surrogate keys
      • effective dates
      • current flag

    🔹 dbt Internal Components

    • Custom Macros (SCD logic)
    • Incremental + MERGE strategy
    • Dependency Graph (ref)

    🔹 SQL Example (MERGE)

    MERGE INTO dim_customers t

    USING stg_customers s

    ON t.customer_id = s.customer_id AND t.is_current = TRUE


    WHEN MATCHED AND t.email <> s.email THEN

    UPDATE SET is_current = FALSE, effective_end_date = CURRENT_TIMESTAMP


    WHEN NOT MATCHED THEN

    INSERT (...)

    🔹 Why required

    • Enables:
      • historical reporting
      • audit tracking
      • slowly changing dimensions

    🔹 Git Role

    • Critical for:
      • tracking SCD logic changes
      • auditing business rule evolution
    • Enables peer review for complex logic

    STEP 5: DATA MART (GOLD LAYER)

    🔹 What happens

    • Build fact and dimension tables
    • Optimize for BI queries

    🔹 dbt Internal Components

    • Ref Graph (DAG builder)
    • Materializations (table/view)
    • Query Planner

    🔹 SQL Example

    SELECT
      o.order_id,
      c.customer_sk,
      o.order_amount
    FROM {{ ref('fact_orders') }} o
    JOIN {{ ref('dim_customers') }} c

    🔹 Why required

    • Serves:
      • dashboards
      • analytics
    • Improves performance via denormalization

    🔹 Git Role

    • Tracks business logic
    • Maintains consistency across metrics
    • Enables versioned analytics definitions

    STEP 6: TESTING & VALIDATION

    🔹 What happens

    • Validate data quality:
      • uniqueness
      • null checks
      • referential integrity

    🔹 dbt Internal Components

    • Test Runner
    • Schema YAML parser
    • Assertion engine

    🔹 YAML Example

    columns:
      - name: customer_id
        tests:
          - not_null
          - unique

    🔹 Why required

    • Prevents bad data propagation
    • Ensures trust in analytics

    🔹 Git Role

    • Tracks test coverage
    • Enforces quality via CI/CD

    • Prevents bad merges
    STEP 7: DOCUMENTATION & DEPLOYMENT

    🔹 What happens

    • Generate lineage docs
    • Deploy models via jobs

    🔹 dbt Internal Components

    • Docs Generator
    • Manifest.json
    • Run Artifacts
    • DAG Renderer

    🔹 Commands

    dbt docs generate
    dbt docs serve

    🔹 Why required

    • Provides:
      • lineage graph
      • column-level documentation
    • Helps onboarding & governance

    🔹 Git Role

    • Stores:
      • documentation YAML
    • Enables:
      • versioned lineage
      • auditability

    Internal Flow inside dbt:

    1. Parser
      • Reads SQL + YAML
    2. Jinja/Macro Engine
      • Expands dynamic logic
    3. Compiler
      • Converts to raw SQL
    4. Adapter (Snowflake)
      • Optimizes SQL
    5. Execution Engine
      • Runs queries
    6. Test Engine
      • Validates output
    7. Docs Generator
      • Builds lineage + metadata
    RAW → SOURCE → STAGING → INCREMENTAL → SCD → MART → TEST → DOCS

    🚀 Key Takeaways

    • dbt is a compiler + orchestrator, not ingestion tool
    • SQL is the primary transformation layer
    • Python is optional (advanced models)
    • Git enables:
      • collaboration
      • versioning
      • CI/CD
    • Each step builds trust, performance, and scalability

    Inside dbt: How data moves from Raw to Refined

     



    dbt internal components : dbt model using SQL or Python

     This diagram presents a high-level architectural blueprint of how dbt Core acts as the orchestration and transformation layer between raw source data and a formalized data warehouse. While the diagrams make it look like data "flows through" dbt, the fundamental takeaway for an architect is that dbt only handles the logic and orchestration (the control plane); the data processing (the data plane) is entirely pushed down to the target warehouse.

    1. Source Data (Raw)

    • The Ingestion Layer: This represents the raw, "bronze-layer" data that has been landed in your cloud data warehouse (e.g., Snowflake, BigQuery) by your ELT tools (like Fivetran, Airbyte, or custom Python scripts).

    • SME Insight: From a dbt perspective, these are entirely external tables that dbt does not manage. We only interact with them through {{ source() }} macros, which allow us to define their location and properties (like freshness) in YAML.

    • Architectural Role: It is the foundational substrate for all downstream transformation. The data must be accessible by the dbt adapter’s credentials.

    2. dbt Project & Model Definitions (.sql & .py)

    • The Blueprint: This is the developer's workspace—the codebase managed in Git. It contains all the instructions for how the data should be modeled.

    • SQL Model: The primary declarative method. It uses Select Statements combined with Jinja templating to create modular, reusable logic. config() macros at the top define how the database object will be materialized (as a table, view, etc.).

    • Python Model (SME Note): Introduced more recently, these models allow data teams to leverage Python libraries (like Pandas or scikit-learn) for operations that are impossible or highly complex in SQL (like machine learning inference or advanced statistical transformations). They are typically executed in the target warehouse's native Python environment (e.g., Snowflake Snowpark or BigQuery’s pandas API).

    • Principal Engineer View: The dbt project is where the principles of Software Engineering (version control, modularity, dry code) are applied to Data Engineering.

    3. Manifest/Parser

    • The Project Cataloger: The Manifest/Parser is the "brain" that reads the entire project. It scans every .sql, .py, and .yml file in your repository to build an in-memory inventory of your models, tests, seeds, sources, and snapshots.

    • Dependency Resolution (DAG): Its critical output is the creation of the Directed Acyclic Graph (DAG). By reading the {{ ref() }} and {{ source() }} macros, it determines the order in which models must be built. If model_B refs model_A, it knows to build model_A first.

    • Validation: The parser validates the syntax, project structure, and config settings to ensure the project is "runnable." It’s the gatekeeper. If the parser finds an error, dbt stops.

    4. Jinja Renderer

    • The Dynamic Engine: The parser identified where {{ ref() }} was used, but the Jinja Renderer is what actually compiles it. This is where the magic of "declarative SQL" comes from.

    • Environment Context: The renderer replaces abstract references like {{ ref('customers') }} with the fully qualified, environment-specific database paths (e.g., prod_db.core.customers in production, or dev_db.alice_schema.customers in development). This single feature makes code-switching between environments seamless.

    • Logic Execution: It processes {% if %}, {% for %}, and custom macros to dynamically generate different SQL or configurations based on input variables or environment settings.

    5. Compilation Engine

    • The Transpiler: Now that Jinja has replaced the abstract terms, the Compilation Engine takes over. Its job is to generate the final, pure-dialect SQL that is ready to be executed.

    • Wrapping the Logic: This engine takes your SELECT query and wraps it in the necessary DDL/DML, depending on the materialization setting. For a materialized='table', it will wrap your SELECT in a CREATE OR REPLACE TABLE AS.... For an incremental model, it will generate a complex MERGE or INSERT statement based on the logic you defined.

    • Adapter Awareness: The compilation engine uses the Target DB Adapters (see below) to know exactly what SQL dialect (Snowflake, BigQuery, Postgres) to generate.

    6. Artifacts Generator

    • The Persistent Memory: As dbt runs, it saves its state and knowledge into two main files:

      • manifest.json: A full, JSON-readable dump of everything the Manifest/Parser found. This is dbt’s "project-at-rest" state. It is used downstream by other tools and for State-Based Execution, allowing dbt to run only the models that have changed since the last run.

      • catalog.json: A JSON file generated after a dbt docs generate command. It contains the schema, column definitions, and data types from the target warehouse for all managed models.

    • Metadata Foundation: These artifacts are essential for dbt’s automated documentation website and for integration with modern data catalogs.

    7. Execution Engine (dbt-core)

    • The Conductor: The Execution Engine is the final orchestrator. It doesn't perform data work itself, but it knows when and where that work should happen.

    • Job Management: It takes the generated SQL/Python from the Compilation Engine and dispatches it to the cloud data warehouse via the established database connection. It manages the threads (concurrency), sending parallel queries to the warehouse when the DAG allows.

    • SME Insight: The Execution Engine handles the lifecycle of the job. It establishes the connection, starts the query, waits for the result, records performance metrics (start time, end time, status), and handles the commit/rollback logic on completion or error. It is the component that makes dbt a full-fledged deployment coordinator.

    8. Target DB Adapters (Snowflake, BigQuery, etc.)

    • The Translation Layer: This is the abstraction layer that allows dbt Core to remain dialect-agnostic. The adapters (e.g., dbt-snowflake, dbt-bigquery) translate abstract dbt commands into the specific API calls and SQL dialects required by that database.

    • Database-Specific Operations: The adapter knows that a table creation statement in Snowflake is different from BigQuery. It handles the nuances of connection authentication, data types, and transactional control.

    9. Transformed Data Models & Transformation Execution

    • The Destination: This is the finalized, modeled data that is now ready for analytics (the "gold-layer").

    • Pushed-Down Execution: This is the single most critical concept. All data movement and processing occur entirely inside the Target Cloud Data Warehouse.

      • SQL models are executed as CREATE TABLE AS... or INSERT/MERGE SQL commands directly against the warehouse engine.

      • Python models are packaged and sent to the warehouse’s internal runtime (like Snowpark UDFs) where they execute in specialized, serverless compute containers close to the data.

    • Output: The result is high-quality, trusted tables like dim_customers and fct_orders which are query-ready for your BI tools.