Section A · Orient

The Role, Decoded

What "Senior Data Analytics Engineer" actually means in practice, how it differs from data engineer and analyst roles, and what's specifically different at AI infrastructure companies.

The three-role triangle

Three adjacent titles confuse candidates and hiring managers alike. They overlap, but the centers of gravity differ:

Data EngineerAnalytics EngineerData Analyst
Center of gravityInfrastructure, ingestion, scalingTransformation, modeling, data productsBusiness questions, dashboards, ad-hoc
Primary languagesPython, SQL, Java/Scala, sometimes RustSQL, dbt, Python (secondary)SQL, spreadsheets, BI tools
Lives inAirflow / Spark / Kafka, infra-as-codedbt + warehouse + GitLooker / Tableau / Mode
OwnsIngestion + serving infraModels in the warehouse + tests + docsReports + insights + stakeholder relationships
Failure modePipeline outagesWrong numbers, slow modelsBad recommendations
"Senior Data Analytics Engineer" specifically

The hybrid title that became standard around 2022. Center of gravity: the transformation layer. Most days you're writing dbt models in SQL, designing star schemas, writing tests, building lineage. You touch data engineering when you need to (Airflow DAGs, warehouse tuning) and you touch analyst work when you need to (stakeholder conversations, metric definitions). You're the bridge that makes the warehouse actually useful.

What the work actually looks like

A typical week:

  • Build / extend data models in dbt — new dimensions, new metrics, refactoring legacy SQL into modular models with tests.
  • Define metrics — work with finance / product / ops to nail down "what does active user mean here?" Then encode it in the metrics layer.
  • Debug data quality issues — a dashboard's revenue number looks off; trace through lineage, find the broken upstream join.
  • Write tests — unique, not_null, relationships, accepted_values, custom singular tests. Make the suite a gate, not a courtesy.
  • Performance work — a model takes 40 minutes; figure out why (skewed joins, unnecessary CTEs, lack of clustering), fix it.
  • Documentation — yaml descriptions, exposures, lineage that makes the warehouse navigable for non-experts.
  • Stakeholder conversations — translate "what's our GPU margin by region?" into a defensible model.
  • Reviewing PRs — yes, dbt PRs get reviewed. CI runs the test suite against staging.

The canonical modern data stack

You're not going to be quizzed on every tool, but you should be fluent with the shape and know the major players:

LayerWhat it doesCommon tools
Ingestion / ELMove raw data from sources into the warehouseFivetran, Airbyte, Stitch, custom Python, Kafka Connect, Snowpipe
StorageWarehouse or lakehouse where data livesSnowflake, BigQuery, Redshift, Databricks, ClickHouse
Transformation / TTurn raw → analytics-readydbt (dominant), SQLMesh, Dataform
OrchestrationRun things in order, retry, scheduleAirflow, Dagster, Prefect, dbt Cloud's scheduler
BI / consumptionDashboards, explorationLooker, Tableau, Mode, Hex, Metabase, Sigma, Preset
Metrics layerSingle source of truth for metric definitionsdbt Semantic Layer, Cube, MetricFlow, LookML
Data quality / observabilityMonitor freshness, schema, anomaliesMonte Carlo, Elementary, Great Expectations, dbt tests
Lineage / catalogWhere does this column come fromDataHub, OpenLineage, Atlan, dbt docs
Reverse ETLPush warehouse data back to operational toolsHightouch, Census

You absolutely need fluency with SQL + dbt + a warehouse (Snowflake or BigQuery). Everything else is "I know what this is for and which one I'd pick when."

At AI / GPU infrastructure companies — what's specific

For roles at AI compute, GPU marketplace, or inference platform companies, the data you'll be modeling has unusual characteristics:

  • High-volume, high-frequency telemetry — GPU utilization metrics every few seconds from thousands of nodes. Time-series flavor. ClickHouse, Druid, or warehouse-native time-series tables get involved.
  • Inference logs — every API request has latency, token counts, model version, customer ID, cost. Petabyte-scale potential. Sampling and aggregation strategies matter.
  • Multi-tenant economics — you're often computing unit economics per customer, per model, per region. "What's our gross margin on Llama-70B for customers in Europe?" is a typical question.
  • Real-time-ish requirements — billing accuracy demands fresh data. Engineering may want hourly cost dashboards. Streaming or micro-batch matters more than at a classic SaaS.
  • Provider-side data (for marketplaces) — if GPUs come from third-party providers, you're tracking supplier utilization, payouts, reliability. Two-sided marketplace metrics.
  • Model performance + cost tradeoffs — analysts asking "if we switched these customers from Opus to Sonnet, what's the quality cost vs the margin gain?" You'll be building those analyses.

See 17-ai-compute-domain for a deeper walkthrough of the data model and key metrics.

If you've never worked at an AI infra company

That's fine. Most candidates haven't. What you can prepare: read the company's pricing page closely (it tells you their unit economics), think about what metrics their finance team needs, and have a few opinions about how you'd model GPU-hours and request logs. That preparation alone differentiates you.

JD signals to watch for

Job descriptions for analytics-engineering roles tend to have signature phrases. Decode them:

  • "Build the analytics layer" → dbt-heavy transformation work. Models, tests, semantic layer.
  • "Partner with finance / product / ops" → stakeholder management, metric definitions. Practice translating business → SQL.
  • "Data quality / trust" → testing, alerts, observability. Have a story about how you'd onboard quality.
  • "End-to-end ownership" → expect to design and implement and debug. Generalist mindset.
  • "Self-serve analytics" → enabling analysts and PMs to query without needing you. Implies docs, modeling for accessibility, BI tooling.
  • "Real-time / streaming" → most analytics-engineer roles are batch-first. If this is highlighted, the stack probably includes Kafka + Flink/Spark Streaming, or warehouse-native streaming (Snowflake Dynamic Tables, BigQuery Continuous Queries).
  • "Performance optimization" → warehouse tuning, query plans, clustering, partitioning. Be ready to talk about why a query is slow, not just how to make it faster.
  • "SDK / API for data" → less common, but at infra companies they may want a data product (e.g. usage dashboard, billing API) as much as internal analytics.

What to ask them

Strong, role-fit questions to have ready:

  1. "What's the state of the warehouse today — clean, partially modernized, or do you have a wedge of legacy SQL that needs untangling?"
  2. "How is the metrics layer organized? dbt Semantic Layer, LookML, MetricFlow, or convention-based?"
  3. "Who owns data quality alerts when they fire? Is there a rotation?"
  4. "How do data and engineering collaborate on schema changes? Contract or convention?"
  5. "What's the one model in production right now that everyone's afraid to touch, and why?" (Reveals a lot about tech debt.)
  6. "What's the team's biggest analytics question that's still hard to answer with the current setup?" (Reveals where you'd have leverage.)
  7. "How does the team handle ad-hoc analyst requests vs structured modeling work? Where's that line?"

These show you're thinking about the role as a builder and a partner, not just an SQL gun-for-hire.