What is dbt_codegen (Quick Context)
dbt_codegen is a macro package that auto-generates repetitive dbt code (SQL + YAML) by introspecting your warehouse metadata
๐ It reduces:
- Manual boilerplate
- Human error in YAML/docs/tests
- Time-to-model for new datasets
Core Use Cases (with Architecture Thinking)
1) ๐งฑ Generate Source YAML from Raw Tables
Why it matters
- Standardizes ingestion layer
- Enables lineage + freshness + testing
Example:
{{ codegen.generate_source(
schema_name='raw',
database_name='analytics_db'
) }}
Output (auto-generated)
</> YAML
version: 2
sources:
- name: raw
tables:
- name: customers
- name: orders
Architect Insight
- Use this when onboarding new schemas from ingestion tools (Fivetran, Airbyte)
- Combine with:
- freshness tests
- source-level SLAs
2) Generate Base Models (Staging Layer)
Why it matters
- Enforces consistent staging patterns
- Accelerates Medallion architecture (Bronze → Silver)
Example
</> SQL
{{ codegen.generate_base_model(
source_name='raw',
table_name='orders'
) }}
Output
</> SQL
select
id,
customer_id,
order_date,
amount
from {{ source('raw', 'orders') }}
Architect Insight
-
Pair with naming convention:
stg_<source>__<table> -
Add:
- column casting
- null handling
- deduplication logic
3) Generate Model YAML (Docs + Tests)
Why it matters
- Ensures every model is documented and testable
- Critical for governance + data contracts
Example
</> SQL
{{ codegen.generate_model_yaml(
model_names=['stg_orders']
) }}
Output
</> YAML
version: 2
models:
- name: stg_orders
columns:
- name: id
tests:
- not_null
- unique
Architect Insight
-
Use this to enforce:
- Data quality SLAs
- Contract-first modeling
- Extend with:
-
accepted_values -
relationships
4) Generate Column Lists Dynamically
Why it matters
-
Avoids
SELECT * - Prevents breakage when schema evolves
Example
</> SQL
select
{{ codegen.get_columns_in_relation(
ref('stg_orders')
) }}
from {{ ref('stg_orders') }}
Output
</> SQL
select id, customer_id, order_date, amount
Architect Insight
-
Combine with:
- incremental models
- snapshot logic
- Helps with schema drift handling
5) ๐งช Generate Generic Tests Quickly
Why it matters
- Accelerates data quality adoption
Example
</> SQL
{{ codegen.generate_model_yaml(
model_names=['fct_orders'],
include_tests=True
) }}
Architect Insight
- Use for:
- onboarding new teams
- enforcing platform-wide testing standards
6) ๐ Automate Documentation at Scale
Why it matters
- Documentation is often neglected → dbt_codegen fixes that
Example Workflow
- Run macro
-
Paste into
.yml - Add descriptions later
Architect Insight
- Integrate into CI/CD:
- auto-generate → PR → review
7) ๐งฉ Metadata-Driven Development
Why it matters
- Moves toward declarative analytics engineering
Example Pattern
- Introspect schema → generate models → enrich logic
Architect Insight
-
Combine with:
- information_schema queries
- data catalogs
- AI-assisted model generation
๐ท How dbt_codegen Works in dbt Cloud (Internals)
Step-by-step Flow
1) Metadata Introspection
-
Queries warehouse system tables
(information_schema.columns)
2) Macro Execution
-
Jinja macros build:
- SQL
- YAML
- Tests
3) Output Rendering
-
Output appears in:
- dbt Cloud IDE
- CLI logs
4) Manual/Automated Integration
- Developer copies OR pipelines inject into repo
No comments:
Post a Comment