Data Observability
Lineage, freshness, volume, schema, distribution. The five things to monitor for any production data system — plus the tools and the discipline around alerts.
The five pillars (Monte Carlo's framing, now standard)
- Freshness — when did data last update?
- Volume — how many rows arrived?
- Schema — what columns and types?
- Distribution — what do values look like?
- Lineage — where does this come from, what depends on it?
Mention these by name. They're industry standard vocabulary.
Lineage
Knowing how a column propagated from source through transformations to the dashboard. Critical for:
- Impact analysis — "if we change this raw column, what breaks?"
- Root-cause analysis — "this dashboard is wrong, where did the bad value enter?"
- Compliance — "where does this PII end up?"
Tools: dbt docs (lineage within dbt), OpenLineage (cross-tool standard), DataHub, Atlan, Alation (commercial catalogs with lineage).
Freshness
Stale data lies silently. Monitor at every stage:
- Source freshness — when did the upstream system last load? dbt source freshness + alerts.
- Model freshness — when did the dbt model last run? dbt-utils recency tests.
- Dashboard freshness — when did the BI tool last refresh? Most BI tools surface this.
Set thresholds based on SLA. A daily-batch model should warn if not refreshed in 25 hours, error after 30. A near-real-time model: warn in minutes.
Volume
Row counts in expected range. Common patterns:
- Absolute — table has at least X rows.
- Relative — today's rows are within N% of trailing average.
- Statistical — N standard deviations from baseline.
Spikes and drops both matter. A spike could be duplication; a drop could be a broken ingestion. Both deserve alerts.
Schema
Column additions are usually safe. Removals, renames, type changes are usually bugs. Monitor:
- Expected columns exist.
- Types haven't changed.
- Unexpected new columns flagged for review (in case they're PII).
dbt model contracts make this explicit at the model level. Schema registries do it at the event-stream level.
Distribution
The "values look right" pillar. Things to watch:
- Null proportion — sudden jump in nulls = upstream broke.
- Value distribution — mean, p50, p95 of numeric columns. Sudden shifts = anomaly.
- Cardinality — distinct count of categorical columns. New / disappeared categories.
- Outlier rate — rows in tails of distribution.
This is where ML-driven tools (Monte Carlo, Anomalo) shine — they baseline distributions automatically and flag deviations.
Tools landscape
| Tool | Class | Best for |
|---|---|---|
| dbt tests + dbt-utils | OSS, in-warehouse | Baseline — every project should have these |
| dbt-expectations | OSS, in-warehouse | Distribution and statistical tests |
| Elementary | OSS, dbt-native | Anomaly detection + observability dashboard over dbt artifacts |
| Great Expectations | OSS, Python-first | Standalone validation pipelines, non-dbt environments |
| Monte Carlo, Bigeye, Anomalo | Commercial, ML-driven | Org-wide observability with less config |
| DataHub, Atlan, Alation | Commercial catalogs | Lineage + catalog + governance at scale |
| OpenLineage | OSS standard | Cross-tool lineage protocol |
Alerts that don't get ignored
The fastest way to break a data team's trust in their own monitoring: pages that fire too often. Discipline:
- Severity tiers — error (page), warn (Slack channel), info (dashboard only).
- Owners on every alert — assigned to a model and a person/team.
- Runbooks — when X alerts, do Y to investigate. Documented in the model's yaml or wiki.
- Quiet hours — non-critical alerts pause overnight; only true outages page at 3 AM.
- Alert review — weekly check: which alerts fired, which were noise, tune accordingly.
Every "let me add a test for this" without thinking about severity leads to a future where everything fires constantly and nothing gets investigated. Tests are free to write; alerts are expensive to act on. Be conservative with what pages.
Talking points
"Five pillars — freshness, volume, schema, distribution, lineage. Freshness via source freshness checks and recency tests; volume via row-count anomaly detection; schema via tests on expected columns and dbt contracts; distribution via dbt-expectations or a commercial tool like Monte Carlo; lineage via dbt docs at minimum, OpenLineage or DataHub at scale. Severity discipline matters — error level for true contracts, warn for signal. Every alert needs an owner and a runbook, or it becomes noise."
"Walk the lineage backwards. Start at the dashboard — what model is it pulling from. Then the model — what does its SQL do, what's the grain, do the tests pass. Then sources — is upstream loading correctly. Common bugs at each layer: fanout from a wrong-grain join, NULL filters dropping rows silently, time-zone confusion, deduplication that didn't deduplicate, a stale incremental that lost late-arriving data. Once root-caused: fix the data, add a test that would have caught it, document the gotcha. Optional final step: postmortem if it's painful enough."