DBT Cheat Sheet
Beginner
What is dbt, init, run, build — core CLI and concepts.
What is dbt?
dbt (data build tool) runs SQL in your warehouse. You write SELECTs; dbt handles DDL/DML, dependencies, and testing.
Models are .sql files; ref() builds the DAG. Use for transforms, docs, and data quality.
dbt init & run
Create a project and compile/run models.
CLI
dbt init my_project
cd my_project
dbt run
dbt build # run + test
💡 Configure profiles.yml with your warehouse connection first.
Project layout
Standard folders in a dbt project.
Layout
my_project/ dbt_project.yml models/ seeds/ snapshots/ tests/ macros/ target/
Project & config
dbt_project.yml and model config (schema, materialization, tags).
dbt_project.yml
Project name, profile, and default config.
YAML
name: my_project profile: my_project model-paths: [models] target-path: target models: my_project: +schema: analytics +materialized: table
Config in model (config block)
Override schema, materialization, or add tags per model.
SQL
{{ config(
materialized='view',
schema='staging',
tags=['daily']
) }}
SELECT * FROM {{ ref('raw_orders') }}
Seeds
CSV files in seeds/ loaded into the warehouse. Run dbt seed to load them; then ref() in models.
CLI
dbt seed
# In model: FROM {{ ref('country_codes') }}
Models & ref()
Reference other models with ref(); dbt builds the DAG and runs in order.
Simple model
A model is a .sql file that compiles to a single SELECT. The filename (minus .sql) is the model name.
SQL
-- models/stg_orders.sql SELECT id, customer_id, amount, created_at FROM {{ ref('raw_orders') }}
ref() and source()
ref(‘model_name’) references a dbt model. source(‘source_name’,’table_name’) references raw tables you define in YAML.
SQL
SELECT * FROM {{ source('stripe', 'payments') }} UNION ALL SELECT * FROM {{ ref('stg_orders') }}
Sources
Define raw tables and optional freshness checks.
sources in YAML
List sources (database/schema/table) so you can use source() and run source freshness.
YAML
sources: - name: stripe database: raw schema: payments tables: - name: payments - name: customers
Source freshness
Warn or error if loaded_at (or chosen column) is older than threshold.
YAML
sources: - name: stripe freshness: warn_after: { count: 12, period: hour } error_after: { count: 24, period: hour } loaded_at_field: _loaded_at tables: - name: payments
Run: dbt source freshness
Tests
Generic (unique, not_null, etc.) and singular tests.
Generic tests (schema)
Add tests under the model in schema.yml. unique, not_null, accepted_values, relationships.
YAML
models: - name: dim_customers columns: - name: id tests: [unique, not_null] - name: status tests: - accepted_values: values: [active, churned]
relationships test
Ensure a column references a key in another model (referential integrity).
YAML
- name: customer_id tests: - relationships: to: ref('dim_customers') field: id
Singular test (SQL)
Custom test: a .sql file that returns rows that should be zero (fail if any row returned).
SQL
-- tests/amount_positive.sql SELECT * FROM {{ ref('orders') }} WHERE amount < 0
Jinja
Loops, conditionals, variables — use in .sql and .yml.
var() and env
Use variables from dbt_project.yml or command line (–vars ‘key: value’).
SQL
SELECT * FROM {{ ref('events') }} WHERE date >= '{{ var("start_date", "2024-01-01") }}'
for loop (union tables)
Generate SQL over a list — e.g. union multiple tables.
SQL
{% for src in var('regions') %}
SELECT * FROM {{ ref(src) }}
{% if not loop.last %}UNION ALL{% endif %}
{% endfor %}
if / set
Conditional logic and local variables in Jinja.
SQL
{% set limit_rows = var('limit', 1000) %}
SELECT * FROM {{ ref('big_table') }}
{% if limit_rows %} LIMIT {{ limit_rows }} {% endif %}
Incremental models
Append or merge only new rows; full refresh with dbt run –full-refresh.
Incremental (append)
First run: build full table. Next runs: append only rows that match the filter (e.g. new dates).
SQL
{{ config(materialized='incremental') }}
SELECT id, created_at, amount
FROM {{ ref('stg_orders') }}
{% if is_incremental() %}
WHERE created_at > (SELECT MAX(created_at) FROM {{ this }})
{% endif %}
💡 this is the current model’s table. Use unique_key for merge strategy on some adapters.
Snapshots (SCD Type 2)
Track changes over time with valid_from / valid_to; run dbt snapshot.
Snapshot definition
Snapshot a source or ref; dbt adds dbt_valid_from, dbt_valid_to and updates rows on each run.
SQL
{% snapshot orders_snapshot %}
{{ config(
target_schema='snapshots',
unique_key='id',
strategy='timestamp',
updated_at='updated_at'
) }}
SELECT * FROM {{ ref('stg_orders') }}
{% endsnapshot %}
Docs & Hooks
Documentation blocks and pre/post-hooks (e.g. grants, vacuum).
docs block
Add a doc block and reference it in schema.yml for rich docs.
SQL / YAML
{% docs order_amount_desc %}
Total order amount in USD.
{% enddocs %}
In schema: description: "{{ doc('order_amount_desc') }}". Then dbt docs generate and dbt docs serve.
Hooks (pre / post)
Run SQL before or after a model runs (e.g. grants, vacuum, analyze).
YAML
models: - name: dim_customers post-hook: - "GRANT SELECT ON {{ this }} TO role analyst" - "ANALYZE TABLE {{ this }}"
