AI Compute Platform Data
What the data looks like at GPU marketplaces, inference clouds, and AI training platforms — and what they'll want you to think about.
What these companies do
The "AI compute" category has a few distinct business models. Many companies span multiple:
- Inference cloud — host open models (Llama, Mistral, etc.) on managed GPUs, charge per token. Examples: Together, Fireworks, Replicate, Anyscale, Lepton.
- GPU marketplace — match third-party GPU providers (cloud providers, decentralized operators) with customers who need compute. Often by-the-hour rentals or on-demand. Examples: Vast.ai, RunPod, Akash.
- Training compute — sell access to multi-GPU clusters for model training. Per-hour, per-cluster.
- Fine-tuning platforms — take a base model + customer data, produce a fine-tuned model, serve it.
- Foundation model providers — train their own large models, serve via API (OpenAI, Anthropic, Mistral). Different business model — more like B2B SaaS than infra.
The data analytics work is more interesting at the infrastructure companies (the first three) — there's real unit economics and operational complexity. Foundation model providers have simpler unit economics but more complex training-time analytics.
The data shape
AI compute platforms produce unusual data compared to a classic SaaS:
- Very high volume telemetry — every GPU emits utilization, memory, temperature metrics every few seconds. Thousands of GPUs × seconds = billions of rows fast.
- Inference request logs at scale — every API call has latency, tokens, cost, customer, model, region. Petabyte scale is realistic.
- Time-series flavor — much of the data is time-indexed metrics, not classic transactional data. ClickHouse, Druid, or warehouse time-series tables matter.
- Multi-tenant with stark unit economics — every customer's behavior directly translates to cost. Margin is computable per-customer, per-model, per-region.
- Real-time-ish needs — billing, capacity planning, alerting. Pure daily-batch isn't enough.
- Two-sided marketplace data (for marketplaces) — supply side (providers, their availability and reliability) and demand side (customers, their consumption).
Unit economics — the core analytical problem
The single most-asked-about analytical question at AI compute companies: "are we making money on this customer / this model / this region?"
The basic equation
The leverage
Unit margin scales with two things:
- Utilization — a GPU rented at $X/hour serving 100 requests/hour costs $X/100 per request. The same GPU serving 1000 requests/hour costs $X/1000. Utilization is the dominant lever on margin.
- Throughput per dollar — better batching, better inference servers (vLLM, TensorRT-LLM), better quantization push throughput up at the same hardware cost.
Analytics work follows: utilization dashboards, throughput tracking, per-customer-per-model margin reports.
If asked "what would you build first?" at an inference platform, "a per-customer-per-model gross-margin model" is a strong answer. It connects directly to leadership's #1 question and forces you to model the data right.
GPU utilization — the load-bearing metric
"Utilization" is overloaded — make sure you know which one is being asked:
| Metric | What it means | How it's computed |
|---|---|---|
| Compute utilization | % of GPU cycles doing work | From nvidia-smi: utilization.gpu |
| Memory utilization | % of VRAM in use | memory.used / memory.total |
| Time utilization | % of time the GPU was assigned to a customer (vs idle) | Sum of allocated_seconds / total_seconds |
| Revenue utilization | Revenue generated per GPU-hour | Sum of request revenue / GPU-hours |
| MFU (Model FLOPs Utilization) | Actual FLOPS / theoretical peak FLOPS | Used in training; harder to measure |
The modeling layer
Periodic snapshot fact at hourly grain is standard:
CREATE TABLE fct_gpu_hourly_utilization (
gpu_id STRING,
hour_ts TIMESTAMP,
utilization_pct FLOAT, -- compute, averaged over the hour
memory_used_gb FLOAT, -- VRAM, averaged
assigned_minutes INT, -- how many minutes was the GPU allocated to a customer
requests_served INT, -- inference requests handled
revenue_usd FLOAT, -- revenue attributed to this GPU this hour
cost_usd FLOAT, -- provider payout or amortized hardware cost
current_customer_id STRING, -- if single-tenant during this hour
current_model_id STRING, -- model loaded if applicable
PRIMARY KEY (gpu_id, hour_ts)
);
From this single fact you can answer:
- Fleet-wide utilization by region / provider / GPU type.
- Idle GPU-hours (assigned_minutes < 60).
- Revenue per GPU per day / week / month.
- Margin per GPU (revenue - cost).
utilization_pct is averaged, not summed, when you roll up over time. Don't sum utilization percentages across hours — you'll get a meaningless number. Average instead. This is the classic semi-additive measure trap.
Inference logs — petabyte territory
Every API request to an inference platform produces a log entry. At scale this is the biggest dataset in the warehouse.
Schema
CREATE TABLE fct_inference_requests (
request_id STRING,
customer_id STRING,
api_key_id STRING, -- which key, in case customer has many
model_id STRING,
gpu_id STRING, -- which GPU handled it
region_id STRING,
occurred_at TIMESTAMP,
date_key DATE, -- for partitioning
latency_ms INT, -- end-to-end
ttft_ms INT, -- time to first token (streaming)
input_tokens INT,
output_tokens INT,
cost_usd DECIMAL(18,8), -- compute cost
revenue_usd DECIMAL(18,8), -- billed to customer
status STRING, -- 'success' | 'timeout' | 'error' | 'rate_limited'
error_code STRING,
request_metadata VARIANT/JSON -- temperature, max_tokens, system_prompt_hash, etc.
);
-- Partition by date_key, cluster by (customer_id, model_id)
Aggregation strategy
Don't serve this raw to analysts. Build a layered aggregation:
Most queries hit the hourly or daily aggs. Raw is for spot debugging.
The token-counting gotcha
Two ways to count tokens differ by ~5-15%:
- Tokenizer output — what the model actually consumed.
- Billable tokens — what you charge for (may round, may have a minimum).
Be explicit about which one a metric is using. "Revenue uses billable_tokens; throughput uses tokenizer_tokens."
Marketplace dynamics (if applicable)
If the company is a two-sided marketplace (GPU providers → customers), the data has supply-side and demand-side dimensions.
Supply side
- Providers — who supplies GPUs. Their reliability, payout terms, geographic distribution.
- GPU inventory — what's available, what's reserved, what's offline.
- Provider utilization — are providers' GPUs being used? Idle GPUs = wasted supply.
- Provider payouts — billing them, attributing revenue.
- Reliability metrics — uptime, failure rates, response time.
Demand side
- Customer onboarding funnel — signup → first request → meaningful usage.
- Cohort retention — do customers stay or churn?
- Concentration risk — top 5 customers' share of revenue.
- Model preferences — which models drive demand.
Marketplace-specific metrics
- Take rate — % of GMV the platform keeps.
- Match quality — how well supply meets demand (queue depth, request failures due to no available GPU).
- Geographic balance — is supply in the same regions as demand?
- Liquidity — for new entrants on either side, how long until they get utilization / get matched.
Billing & invoicing data
Billing data is the most-scrutinized data at an infra company. Finance audits it; customers dispute it; SOC2 wants tamper-evidence. Special considerations:
- Idempotency is non-negotiable — billing the same request twice gets you sued. Every billing event needs an idempotency key.
- Append-only history — never update a billing record in place. Append a correction event.
- Currency / FX — bill in customer's currency, account in USD, store rate at time-of-transaction.
- Pro-rating — partial months, plan changes mid-cycle.
- Credits and refunds — model these as separate event types with explicit references to the original charge.
- Latency tolerance — billing data must be eventually accurate, often within hours. Real-time isn't required, but reconciliation matters.
Multi-tenant analytics
Two distinct multi-tenant questions:
1. Customer-facing analytics (the data product)
If the platform shows each customer their own usage dashboard, you're effectively building an SaaS analytics product. Concerns:
- Row-level security / tenant isolation — customer A must never see customer B's data.
- Latency requirements — customer-facing dashboards expect sub-second response. Pre-aggregate + cache.
- Custom rollups — customers may want to slice by their own tags / API keys.
- Embedded BI — Looker, Cube, ClickHouse + a thin app — pick a serving pattern early.
2. Internal cross-tenant analytics (the data team's work)
Finance / product / ops need cross-tenant analyses: "top 10 customers by usage", "new-customer cohort retention by signup month." Concerns:
- Access controls — who internally can query PII vs aggregated?
- Audit logs — who queried what when.
- Tokenization where customer identities aren't needed.
Key metrics to be conversational about
Revenue / customer
- ARR / MRR — but careful, usage-based pricing makes ARR a derived projection.
- Revenue per customer per month — clear, used everywhere.
- NRR (Net Revenue Retention) — same-customer revenue growth, including churn.
- GRR (Gross Revenue Retention) — same-customer retention without expansion. Floors NRR.
- Concentration — top-N customers' share of revenue.
Cost / infrastructure
- COGS — cost of goods sold; for infra companies, mostly GPU costs.
- Gross margin — revenue minus COGS.
- GPU utilization rate — fleet-wide, by region, by GPU type.
- Cost per token / per request / per GPU-hour.
Reliability / ops
- p50 / p95 / p99 latency — by model, by region.
- TTFT (time to first token) — for streaming inference.
- Success rate / error rate — overall and per-error-class.
- GPU failure rate — per provider.
- Queue depth / wait time — when demand exceeds supply.
Customer behavior
- Activation — first meaningful request after signup.
- Time-to-first-value — signup to first successful production-volume usage.
- Cohort retention — % of signup cohort still active at month N.
- Model adoption — share of usage by model.
- Workload mix — chat vs batch vs streaming vs fine-tuning.
Interview talking points
"Three groups. First, unit economics — gross margin per customer per model per region. That's the question executives lose sleep over and it requires modeling the data right end-to-end. Second, utilization — both GPU compute utilization and 'are we serving requests on the GPUs we're paying for.' Idle GPU-hours are pure cost. Third, reliability — p99 latency, success rate, TTFT for streaming. Infra customers measure us on these. Most other metrics — onboarding funnel, customer retention — apply at any SaaS but matter especially because expansion revenue at infra companies depends on customer workloads scaling, which depends on the platform being trustworthy."
"First — is it a real signal or a data issue? Check the data pipeline, source freshness, and any recent model or metric changes. If real, decompose. Margin = revenue minus cost; which side moved? If revenue, segment by customer and model — did a high-margin customer churn or downgrade? If cost, segment by GPU provider, region, model — did a region's GPU costs spike, did utilization drop somewhere? The decomposition almost always points at a specific tenant, region, or model. Once isolated, the question shifts from analytics to ops — why did that change."
"Don't serve raw — pre-aggregate. Raw stays in cheap object storage, partitioned by date and customer. Aggregations live in the warehouse: 5-minute aggs for ops dashboards, hourly for cost analysis, daily for retention and finance. Most queries hit aggregations; raw is for spot debugging with strict partition filters. For sub-second customer-facing dashboards, push the daily aggs to a fast OLAP store — ClickHouse or warehouse materialized views. Two anti-patterns to avoid: scanning raw without partition filters (one query can cost more than a month of compute), and over-aggregating so analysts can't answer new questions without a heavy re-aggregation job."