Data Quality & Testing
Tests, freshness, anomaly detection, contracts. How data teams build trust — and lose it.
Three layers of data quality
- Schema / structural — does the data have the right shape? (column existence, types, nullability)
- Logical / business rules — does the data satisfy expected invariants? (primary keys unique, foreign keys resolve, values in expected set, business arithmetic holds)
- Statistical / distributional — does the data look right? (row counts in expected range, column distributions stable, no anomalous spikes/drops)
Different tools handle each. dbt tests cover layers 1 and 2 well. Layer 3 needs observability tooling — Monte Carlo, Anomalo, Elementary, or custom anomaly checks.
Tests — dbt and beyond
The dbt built-ins
unique— column has no duplicates.not_null— column has no nulls.accepted_values— column values are within a list.relationships— every value in column A exists in column B of another table (foreign key check).
dbt-utils additions
unique_combination_of_columns— composite unique key.not_null_proportion— at least X% of rows have non-null.recency— most recent row is within X hours.at_least_one— table has at least one row.expression_is_true— arbitrary boolean expression.
dbt-expectations (Great Expectations port)
Distribution tests, regex tests, statistical bounds. Useful for layer 3 checks within dbt.
Singular tests
Custom SQL in tests/ that returns failing rows. Empty result = pass. Use for business-logic checks no built-in covers:
SELECT
o.order_id,
o.revenue AS order_revenue,
SUM(oi.line_revenue) AS line_revenue_sum
FROM {{ ref('fct_orders') }} o
JOIN {{ ref('fct_order_items') }} oi USING (order_id)
GROUP BY o.order_id, o.revenue
HAVING ABS(o.revenue - SUM(oi.line_revenue)) > 0.01
Freshness
Data that's stale lies. Worse, it lies silently — the dashboard looks the same, the numbers just stop moving.
dbt source freshness
sources:
- name: raw_app
loaded_at_field: _ingested_at
freshness:
warn_after: {count: 2, period: hour}
error_after: {count: 6, period: hour}
tables:
- name: orders
Run dbt source freshness in CI / scheduled job. Page or alert on errors.
Model freshness
Test that downstream models have recent data:
models:
- name: fct_orders
tests:
- dbt_utils.recency:
datepart: hour
field: ordered_at
interval: 2
Anomaly detection
Tests catch boolean invariants. They don't catch "revenue dropped 60% today and no one noticed." That's anomaly territory.
Built-in patterns
- Row count anomalies — today's row count is N standard deviations from the trailing average.
- Schema anomalies — column added / removed / type changed unexpectedly.
- Distribution anomalies — column's distribution shifted (mean, p50, p95).
- NULL anomalies — null proportion changed.
Tools
- Elementary — open-source, dbt-native. Stores test results + anomaly detection in your warehouse.
- Monte Carlo, Anomalo, Bigeye — commercial, ML-driven, less config.
- Roll your own — a few dbt-utils tests + scheduled queries can do a lot.
Data contracts
An explicit, versioned agreement between data producer and consumer. The producer commits to: these columns, these types, this freshness, these uniqueness guarantees. The consumer commits to: only relying on what's documented.
Implementation:
- dbt model contracts — declare expected columns + types in YAML; CI fails if the model output doesn't match.
- Schema registry for streaming events (Confluent, Glue Schema Registry).
- Backstage / data catalogs for org-wide visibility.
The cultural piece matters as much as the technical piece. Contracts shift "data team gets blindsided when engineer drops a column" to "producer can't ship without breaking the build."
Severity strategy
If every test is "error severity, page on-call," tests become noise and get ignored. Smart severity:
- Severity error — true contracts. Page if violated. Examples: PK uniqueness, freshness on source data, business rules where wrong data = wrong decisions.
- Severity warn — signal, not gate. Investigate when convenient. Examples: distribution shifts, low-importance accepted-values lists, soft thresholds.
- Threshold-based errors — allow some failures. Example:
error_if: >100means warn at 1-100 failing rows, error at 101+.
Adding tests without owners. A test that fails and no one fixes is noise. Every test should have a clear owner in the model's YAML and a runbook for what to do when it fires.
Incident response
When a quality alert fires, the discipline is:
- Triage — is it real, or a flaky test? Look at the failing rows.
- Communicate — if downstream dashboards are affected, post in the data channel. Don't wait until stakeholders find it.
- Fix the data — the immediate priority. Backfill, re-run, mark records, whatever.
- Root-cause — what change introduced this? Was it a schema change, a logic bug, a data drift?
- Prevent — add a test that would have caught this earlier. Document the gotcha in the model's yaml.
- Postmortem for the painful ones. Mature data teams run them like SREs run incident reviews.
Talking points
"Three layers. First, dbt tests on every model — unique, not_null, relationships, accepted_values at minimum. Custom singular tests for business rules. Second, source freshness checks that alert when upstream stops loading. Third, an observability layer — Elementary or Monte Carlo — for anomalies tests can't catch: distribution shifts, volume drops, schema drift. Critical operational piece: severity strategy. Severity-error for true contracts (PK uniqueness, freshness), severity-warn for signal-not-gate. And every test needs an owner — a test that fails and nobody fixes is just noise."