dbt Cheat Sheet — UTA | Unified Tech Academy



DBT Cheat Sheet

Beginner

What is dbt, init, run, build — core CLI and concepts.

Beginner

What is dbt?

dbt (data build tool) runs SQL in your warehouse. You write SELECTs; dbt handles DDL/DML, dependencies, and testing.

Models are .sql files; ref() builds the DAG. Use for transforms, docs, and data quality.

Beginner

dbt init & run

Create a project and compile/run models.

CLI

dbt init my_project
cd my_project
dbt run
dbt build   # run + test

💡 Configure profiles.yml with your warehouse connection first.

Beginner

Project layout

Standard folders in a dbt project.

Layout

my_project/
  dbt_project.yml
  models/
  seeds/
  snapshots/
  tests/
  macros/
  target/

Project & config

dbt_project.yml and model config (schema, materialization, tags).

Beginner

dbt_project.yml

Project name, profile, and default config.

YAML

name: my_project
profile: my_project
model-paths: [models]
target-path: target

models:
  my_project:
    +schema: analytics
    +materialized: table

Beginner

Config in model (config block)

Override schema, materialization, or add tags per model.

SQL

{{ config(
  materialized='view',
  schema='staging',
  tags=['daily']
) }}

SELECT * FROM {{ ref('raw_orders') }}

Beginner

Seeds

CSV files in seeds/ loaded into the warehouse. Run dbt seed to load them; then ref() in models.

CLI

dbt seed
# In model: FROM {{ ref('country_codes') }}

Models & ref()

Reference other models with ref(); dbt builds the DAG and runs in order.

Beginner

Simple model

A model is a .sql file that compiles to a single SELECT. The filename (minus .sql) is the model name.

SQL

-- models/stg_orders.sql
SELECT id, customer_id, amount, created_at
FROM {{ ref('raw_orders') }}

Beginner

ref() and source()

ref(‘model_name’) references a dbt model. source(‘source_name’,’table_name’) references raw tables you define in YAML.

SQL

SELECT *
FROM {{ source('stripe', 'payments') }}
UNION ALL
SELECT * FROM {{ ref('stg_orders') }}

Sources

Define raw tables and optional freshness checks.

Intermediate

sources in YAML

List sources (database/schema/table) so you can use source() and run source freshness.

YAML

sources:
  - name: stripe
    database: raw
    schema: payments
    tables:
      - name: payments
      - name: customers

Intermediate

Source freshness

Warn or error if loaded_at (or chosen column) is older than threshold.

YAML

sources:
  - name: stripe
    freshness:
      warn_after: { count: 12, period: hour }
      error_after: { count: 24, period: hour }
    loaded_at_field: _loaded_at
    tables:
      - name: payments

Run: dbt source freshness

Tests

Generic (unique, not_null, etc.) and singular tests.

Intermediate

Generic tests (schema)

Add tests under the model in schema.yml. unique, not_null, accepted_values, relationships.

YAML

models:
  - name: dim_customers
    columns:
      - name: id
        tests: [unique, not_null]
      - name: status
        tests:
          - accepted_values:
              values: [active, churned]

Intermediate

relationships test

Ensure a column references a key in another model (referential integrity).

YAML

- name: customer_id
  tests:
    - relationships:
        to: ref('dim_customers')
        field: id

Intermediate

Singular test (SQL)

Custom test: a .sql file that returns rows that should be zero (fail if any row returned).

SQL

-- tests/amount_positive.sql
SELECT * FROM {{ ref('orders') }}
WHERE amount < 0

Jinja

Loops, conditionals, variables — use in .sql and .yml.

Intermediate

var() and env

Use variables from dbt_project.yml or command line (–vars ‘key: value’).

SQL

SELECT * FROM {{ ref('events') }}
WHERE date >= '{{ var("start_date", "2024-01-01") }}'

Intermediate

for loop (union tables)

Generate SQL over a list — e.g. union multiple tables.

SQL

{% for src in var('regions') %}
  SELECT * FROM {{ ref(src) }}
  {% if not loop.last %}UNION ALL{% endif %}
{% endfor %}

Intermediate

if / set

Conditional logic and local variables in Jinja.

SQL

{% set limit_rows = var('limit', 1000) %}
SELECT * FROM {{ ref('big_table') }}
{% if limit_rows %} LIMIT {{ limit_rows }} {% endif %}

Incremental models

Append or merge only new rows; full refresh with dbt run –full-refresh.

Advanced

Incremental (append)

First run: build full table. Next runs: append only rows that match the filter (e.g. new dates).

SQL

{{ config(materialized='incremental') }}

SELECT id, created_at, amount
FROM {{ ref('stg_orders') }}
{% if is_incremental() %}
  WHERE created_at > (SELECT MAX(created_at) FROM {{ this }})
{% endif %}

💡 this is the current model’s table. Use unique_key for merge strategy on some adapters.

Snapshots (SCD Type 2)

Track changes over time with valid_from / valid_to; run dbt snapshot.

Advanced

Snapshot definition

Snapshot a source or ref; dbt adds dbt_valid_from, dbt_valid_to and updates rows on each run.

SQL

{% snapshot orders_snapshot %}

{{ config(
  target_schema='snapshots',
  unique_key='id',
  strategy='timestamp',
  updated_at='updated_at'
) }}

SELECT * FROM {{ ref('stg_orders') }}

{% endsnapshot %}

Docs & Hooks

Documentation blocks and pre/post-hooks (e.g. grants, vacuum).

Intermediate

docs block

Add a doc block and reference it in schema.yml for rich docs.

SQL / YAML

{% docs order_amount_desc %}
Total order amount in USD.
{% enddocs %}

In schema: description: "{{ doc('order_amount_desc') }}". Then dbt docs generate and dbt docs serve.

Intermediate

Hooks (pre / post)

Run SQL before or after a model runs (e.g. grants, vacuum, analyze).

YAML

models:
  - name: dim_customers
    post-hook:
      - "GRANT SELECT ON {{ this }} TO role analyst"
      - "ANALYZE TABLE {{ this }}"

Ultra Tendency Academy — dbt Cheat Sheet