Section E · Domain

AI Compute Platform Data

What the data looks like at GPU marketplaces, inference clouds, and AI training platforms — and what they'll want you to think about.

What these companies do

The "AI compute" category has a few distinct business models. Many companies span multiple:

Inference cloud — host open models (Llama, Mistral, etc.) on managed GPUs, charge per token. Examples: Together, Fireworks, Replicate, Anyscale, Lepton.
GPU marketplace — match third-party GPU providers (cloud providers, decentralized operators) with customers who need compute. Often by-the-hour rentals or on-demand. Examples: Vast.ai, RunPod, Akash.
Training compute — sell access to multi-GPU clusters for model training. Per-hour, per-cluster.
Fine-tuning platforms — take a base model + customer data, produce a fine-tuned model, serve it.
Foundation model providers — train their own large models, serve via API (OpenAI, Anthropic, Mistral). Different business model — more like B2B SaaS than infra.

The data analytics work is more interesting at the infrastructure companies (the first three) — there's real unit economics and operational complexity. Foundation model providers have simpler unit economics but more complex training-time analytics.

The data shape

AI compute platforms produce unusual data compared to a classic SaaS:

Very high volume telemetry — every GPU emits utilization, memory, temperature metrics every few seconds. Thousands of GPUs × seconds = billions of rows fast.
Inference request logs at scale — every API call has latency, tokens, cost, customer, model, region. Petabyte scale is realistic.
Time-series flavor — much of the data is time-indexed metrics, not classic transactional data. ClickHouse, Druid, or warehouse time-series tables matter.
Multi-tenant with stark unit economics — every customer's behavior directly translates to cost. Margin is computable per-customer, per-model, per-region.
Real-time-ish needs — billing, capacity planning, alerting. Pure daily-batch isn't enough.
Two-sided marketplace data (for marketplaces) — supply side (providers, their availability and reliability) and demand side (customers, their consumption).

Unit economics — the core analytical problem

The single most-asked-about analytical question at AI compute companies: "are we making money on this customer / this model / this region?"

The basic equation

Revenue per request = price_per_input_token × input_tokens + price_per_output_token × output_tokens Cost per request = (GPU_hourly_rate × seconds_spent) / 3600 + overhead (allocated network, storage, control plane) Gross margin = Revenue - Cost Margin % = Gross margin / Revenue

The leverage

Unit margin scales with two things:

Utilization — a GPU rented at $X/hour serving 100 requests/hour costs $X/100 per request. The same GPU serving 1000 requests/hour costs $X/1000. Utilization is the dominant lever on margin.
Throughput per dollar — better batching, better inference servers (vLLM, TensorRT-LLM), better quantization push throughput up at the same hardware cost.

Analytics work follows: utilization dashboards, throughput tracking, per-customer-per-model margin reports.

Interview gold

If asked "what would you build first?" at an inference platform, "a per-customer-per-model gross-margin model" is a strong answer. It connects directly to leadership's #1 question and forces you to model the data right.

GPU utilization — the load-bearing metric

"Utilization" is overloaded — make sure you know which one is being asked:

Metric	What it means	How it's computed
Compute utilization	% of GPU cycles doing work	From nvidia-smi: `utilization.gpu`
Memory utilization	% of VRAM in use	`memory.used / memory.total`
Time utilization	% of time the GPU was assigned to a customer (vs idle)	Sum of allocated_seconds / total_seconds
Revenue utilization	Revenue generated per GPU-hour	Sum of request revenue / GPU-hours
MFU (Model FLOPs Utilization)	Actual FLOPS / theoretical peak FLOPS	Used in training; harder to measure

The modeling layer

Periodic snapshot fact at hourly grain is standard:

fct_gpu_hourly_utilization

CREATE TABLE fct_gpu_hourly_utilization (
  gpu_id            STRING,
  hour_ts           TIMESTAMP,
  utilization_pct   FLOAT,    -- compute, averaged over the hour
  memory_used_gb    FLOAT,    -- VRAM, averaged
  assigned_minutes  INT,      -- how many minutes was the GPU allocated to a customer
  requests_served   INT,      -- inference requests handled
  revenue_usd       FLOAT,    -- revenue attributed to this GPU this hour
  cost_usd          FLOAT,    -- provider payout or amortized hardware cost
  current_customer_id STRING, -- if single-tenant during this hour
  current_model_id  STRING,   -- model loaded if applicable
  PRIMARY KEY (gpu_id, hour_ts)
);

From this single fact you can answer:

Fleet-wide utilization by region / provider / GPU type.
Idle GPU-hours (assigned_minutes < 60).
Revenue per GPU per day / week / month.
Margin per GPU (revenue - cost).

Semi-additive nature

utilization_pct is averaged, not summed, when you roll up over time. Don't sum utilization percentages across hours — you'll get a meaningless number. Average instead. This is the classic semi-additive measure trap.

Inference logs — petabyte territory

Every API request to an inference platform produces a log entry. At scale this is the biggest dataset in the warehouse.

Schema

fct_inference_requests

CREATE TABLE fct_inference_requests (
  request_id        STRING,
  customer_id       STRING,
  api_key_id        STRING,        -- which key, in case customer has many
  model_id          STRING,
  gpu_id            STRING,        -- which GPU handled it
  region_id         STRING,
  occurred_at       TIMESTAMP,
  date_key          DATE,          -- for partitioning
  latency_ms        INT,           -- end-to-end
  ttft_ms           INT,           -- time to first token (streaming)
  input_tokens      INT,
  output_tokens     INT,
  cost_usd          DECIMAL(18,8), -- compute cost
  revenue_usd       DECIMAL(18,8), -- billed to customer
  status            STRING,        -- 'success' | 'timeout' | 'error' | 'rate_limited'
  error_code        STRING,
  request_metadata  VARIANT/JSON   -- temperature, max_tokens, system_prompt_hash, etc.
);
-- Partition by date_key, cluster by (customer_id, model_id)

Aggregation strategy

Don't serve this raw to analysts. Build a layered aggregation:

fct_inference_requests (raw, billions of rows, queried rarely) ↓ agg_inference_5min (per 5-min, per customer/model — for real-time dashboards) ↓ agg_inference_hourly (per hour, per customer/model — for cost dashboards) ↓ agg_inference_daily (per day, per customer — for retention / revenue analysis)

Most queries hit the hourly or daily aggs. Raw is for spot debugging.

The token-counting gotcha

Two ways to count tokens differ by ~5-15%:

Tokenizer output — what the model actually consumed.
Billable tokens — what you charge for (may round, may have a minimum).

Be explicit about which one a metric is using. "Revenue uses billable_tokens; throughput uses tokenizer_tokens."

Marketplace dynamics (if applicable)

If the company is a two-sided marketplace (GPU providers → customers), the data has supply-side and demand-side dimensions.

Supply side

Providers — who supplies GPUs. Their reliability, payout terms, geographic distribution.
GPU inventory — what's available, what's reserved, what's offline.
Provider utilization — are providers' GPUs being used? Idle GPUs = wasted supply.
Provider payouts — billing them, attributing revenue.
Reliability metrics — uptime, failure rates, response time.

Demand side

Customer onboarding funnel — signup → first request → meaningful usage.
Cohort retention — do customers stay or churn?
Concentration risk — top 5 customers' share of revenue.
Model preferences — which models drive demand.

Marketplace-specific metrics

Take rate — % of GMV the platform keeps.
Match quality — how well supply meets demand (queue depth, request failures due to no available GPU).
Geographic balance — is supply in the same regions as demand?
Liquidity — for new entrants on either side, how long until they get utilization / get matched.

Billing & invoicing data

Billing data is the most-scrutinized data at an infra company. Finance audits it; customers dispute it; SOC2 wants tamper-evidence. Special considerations:

Idempotency is non-negotiable — billing the same request twice gets you sued. Every billing event needs an idempotency key.
Append-only history — never update a billing record in place. Append a correction event.
Currency / FX — bill in customer's currency, account in USD, store rate at time-of-transaction.
Pro-rating — partial months, plan changes mid-cycle.
Credits and refunds — model these as separate event types with explicit references to the original charge.
Latency tolerance — billing data must be eventually accurate, often within hours. Real-time isn't required, but reconciliation matters.

Multi-tenant analytics

Two distinct multi-tenant questions:

1. Customer-facing analytics (the data product)

If the platform shows each customer their own usage dashboard, you're effectively building an SaaS analytics product. Concerns:

Row-level security / tenant isolation — customer A must never see customer B's data.
Latency requirements — customer-facing dashboards expect sub-second response. Pre-aggregate + cache.
Custom rollups — customers may want to slice by their own tags / API keys.
Embedded BI — Looker, Cube, ClickHouse + a thin app — pick a serving pattern early.

2. Internal cross-tenant analytics (the data team's work)

Finance / product / ops need cross-tenant analyses: "top 10 customers by usage", "new-customer cohort retention by signup month." Concerns:

Access controls — who internally can query PII vs aggregated?
Audit logs — who queried what when.
Tokenization where customer identities aren't needed.

Key metrics to be conversational about

Revenue / customer

ARR / MRR — but careful, usage-based pricing makes ARR a derived projection.
Revenue per customer per month — clear, used everywhere.
NRR (Net Revenue Retention) — same-customer revenue growth, including churn.
GRR (Gross Revenue Retention) — same-customer retention without expansion. Floors NRR.
Concentration — top-N customers' share of revenue.

Cost / infrastructure

COGS — cost of goods sold; for infra companies, mostly GPU costs.
Gross margin — revenue minus COGS.
GPU utilization rate — fleet-wide, by region, by GPU type.
Cost per token / per request / per GPU-hour.

Reliability / ops

p50 / p95 / p99 latency — by model, by region.
TTFT (time to first token) — for streaming inference.
Success rate / error rate — overall and per-error-class.
GPU failure rate — per provider.
Queue depth / wait time — when demand exceeds supply.

Customer behavior

Activation — first meaningful request after signup.
Time-to-first-value — signup to first successful production-volume usage.
Cohort retention — % of signup cohort still active at month N.
Model adoption — share of usage by model.
Workload mix — chat vs batch vs streaming vs fine-tuning.

Interview talking points

"What metrics matter most at an AI infra company?"

"Three groups. First, unit economics — gross margin per customer per model per region. That's the question executives lose sleep over and it requires modeling the data right end-to-end. Second, utilization — both GPU compute utilization and 'are we serving requests on the GPUs we're paying for.' Idle GPU-hours are pure cost. Third, reliability — p99 latency, success rate, TTFT for streaming. Infra customers measure us on these. Most other metrics — onboarding funnel, customer retention — apply at any SaaS but matter especially because expansion revenue at infra companies depends on customer workloads scaling, which depends on the platform being trustworthy."

"How would you debug a sudden drop in gross margin?"

"First — is it a real signal or a data issue? Check the data pipeline, source freshness, and any recent model or metric changes. If real, decompose. Margin = revenue minus cost; which side moved? If revenue, segment by customer and model — did a high-margin customer churn or downgrade? If cost, segment by GPU provider, region, model — did a region's GPU costs spike, did utilization drop somewhere? The decomposition almost always points at a specific tenant, region, or model. Once isolated, the question shifts from analytics to ops — why did that change."

"Inference logs are 10B rows/month. Design the analytics layer."

"Don't serve raw — pre-aggregate. Raw stays in cheap object storage, partitioned by date and customer. Aggregations live in the warehouse: 5-minute aggs for ops dashboards, hourly for cost analysis, daily for retention and finance. Most queries hit aggregations; raw is for spot debugging with strict partition filters. For sub-second customer-facing dashboards, push the daily aggs to a fast OLAP store — ClickHouse or warehouse materialized views. Two anti-patterns to avoid: scanning raw without partition filters (one query can cost more than a month of compute), and over-aggregating so analysts can't answer new questions without a heavy re-aggregation job."