Section B · Product

Product Surface

Together's product spans inference API, dedicated endpoints, fine-tuning, and full training clusters. The breadth lets customers grow with the platform from initial experimentation to production scale.

Inference API

The flagship product. Per-token, OpenAI-API-compatible inference on a curated catalog of open-source models. Customers point their OpenAI client at Together's endpoint, change the model name, and they're running.

The model catalog includes:

Llama family (Meta's open-weight models, including the latest releases).
Mistral and Mixtral.
Qwen (Alibaba's strong multilingual open-source line).
DeepSeek (Chinese open-source releases that have made significant capability progress).
Various fine-tuned variants and instruction-tuned versions.
Specialized models (code, math, vision-language).

The API is the easiest entry point for any customer who's been using OpenAI's API and wants to try open-source alternatives.

Dedicated endpoints

For customers whose workload doesn't fit the per-token model — high throughput, predictable load, custom models — Together offers dedicated endpoints. The customer reserves dedicated GPU capacity; Together operates it; the customer pays a flat rate.

This is the bridge between pure per-token API and full dedicated GPU rental. Customers who outgrow the API but don't want to manage their own infrastructure land here.

Fine-tuning

Together provides managed fine-tuning for open-source models. The customer uploads training data, picks a base model, runs the fine-tune; Together handles the infrastructure and returns a deployable model. The result can be served via the API.

The fine-tuning product captures customers who want to customize models without building training infrastructure themselves.

Training clusters

For customers doing serious training (not just fine-tuning), Together offers dedicated training clusters. These are configured with InfiniBand fabric for multi-node distributed training, and Together provides operational support during the training run.

The training-cluster product overlaps with what CoreWeave and Crusoe offer. Together's positioning is different — it's bundled with the broader inference / fine-tuning lifecycle.

Code / agentic offerings

Together has invested in code-specific and agentic-AI products:

Hosted code-generation models.
Specialized agentic stacks for autonomous AI workflows.
Integration with developer tooling.

These vertical products extend Together's TAM beyond raw inference into specific use-case platforms.

Software stack

Together's inference stack reflects the research lineage:

FlashAttention integration for attention efficiency.
Custom kernels for high-throughput serving.
Speculative decoding optimizations.
Quantization options for memory efficiency.
Batching and scheduling for high throughput at controlled latency.

The performance advantage from this stack vs naive serving is significant. Together can charge competitive prices and still maintain margin because the platform extracts more throughput per GPU than less-optimized alternatives.

Takeaway

Together's product surface is broader than a pure inference API — it spans the full lifecycle of open-source-model use, from API access through fine-tuning to training. The breadth supports customer lifetime value as workloads grow. The next chapter examines the open-source strategy that ties it together.