Read first

Start Here

Eleven-chapter profile of Together.AI — the managed-inference platform on top of GPU. Combines research credibility (FlashAttention, RedPajama) with per-token pricing on curated open-source models.

Scope & audience

Together's product (inference + fine-tuning + training clusters), open-source strategy, research lineage, infrastructure, customers, and how it competes against both other inference platforms (Fireworks, Anyscale, Lepton) and direct API competitors (OpenAI, Anthropic).

Key framings to carry

  1. Together is an inference platform first, GPU cloud second. The product is per-token; the GPU economics are an implementation detail. This is structurally different from Vast / RunPod / Lambda.
  2. Research credibility is commercial. FlashAttention authors and RedPajama work give Together a quality halo that's hard for competitors to manufacture.
  3. Open-source curation is the wedge. The bet that customers want hosted Llama / Mixtral / Qwen / DeepSeek with quality engineering on top, at a price that beats hyperscaler-hosted equivalents.

Reading order

01 (history) → 02 (product surface) → 03 (open-source strategy) → 04 (research credibility) → 05 (pricing) → 06–08 (infra, customers, positioning) → 09–10 (financials + outlook).