Start Here
Eleven-chapter profile of Together.AI — the managed-inference platform on top of GPU. Combines research credibility (FlashAttention, RedPajama) with per-token pricing on curated open-source models.
Scope & audience
Together's product (inference + fine-tuning + training clusters), open-source strategy, research lineage, infrastructure, customers, and how it competes against both other inference platforms (Fireworks, Anyscale, Lepton) and direct API competitors (OpenAI, Anthropic).
Key framings to carry
- Together is an inference platform first, GPU cloud second. The product is per-token; the GPU economics are an implementation detail. This is structurally different from Vast / RunPod / Lambda.
- Research credibility is commercial. FlashAttention authors and RedPajama work give Together a quality halo that's hard for competitors to manufacture.
- Open-source curation is the wedge. The bet that customers want hosted Llama / Mixtral / Qwen / DeepSeek with quality engineering on top, at a price that beats hyperscaler-hosted equivalents.
Reading order
01 (history) → 02 (product surface) → 03 (open-source strategy) → 04 (research credibility) → 05 (pricing) → 06–08 (infra, customers, positioning) → 09–10 (financials + outlook).