Section C · Commercial

Infrastructure

Together blends owned hardware with strategic partner capacity. The hybrid model keeps capital efficiency higher than pure-owned competitors while supporting the operational quality enterprise customers expect.

The hybrid infrastructure model

Together's infrastructure is a mix of:

Owned GPU hardware in partner datacenters.
Capacity rented or reserved from major neocloud partners.
Strategic partnerships for specific deployments.

The blended approach lets Together scale faster than pure-owned models would allow while maintaining operational control over the customer-facing platform.

Owned hardware

For high-utilization inference workloads, owning the GPUs delivers better unit economics than renting. Together has built up substantial owned capacity over 2023-2026 for this purpose. The owned fleet:

Concentrated on current-generation datacenter cards (H100/H200/Blackwell).
Located in partner datacenters with appropriate networking.
Operated under Together's software stack.

The owned-hardware investment is one of the larger capital uses for the funding Together has raised.

Partner capacity

For workloads that exceed owned capacity, Together leverages partnerships with the broader neocloud ecosystem. The model is opaque to customers — they see "Together inference" regardless of which underlying capacity serves a request.

This partnership flexibility is similar to RunPod Serverless's approach. The platform abstracts the underlying capacity question.

GPU mix

Together's GPU footprint spans:

H100 (the workhorse during the 2023-2024 ramp).
H200 for larger-memory inference and fine-tuning.
B200 (Blackwell) for newer deployments.
Some A100 legacy capacity.

Consumer cards aren't in the mix. The inference platform's economics work better on datacenter-grade hardware where serving optimization can extract maximum throughput.

Network

Inference workloads have different network needs from training:

Lower inter-GPU bandwidth requirements for many inference patterns.
Higher API-traffic bandwidth on the external side.
Latency-sensitive networking between regions for global customer base.

For training clusters, the network requirements match the broader enterprise-neocloud picture — InfiniBand fabric for multi-node training.

Scaling capacity

Together's capacity scales in step with traffic. As API traffic grows, more capacity comes online. The blended owned/partner model means scaling can happen at multiple speeds — partners absorb fast spikes; owned capacity provides the baseline.

Takeaway

Together's infrastructure model is hybrid by design — owned capacity for capital-efficient base load; partner capacity for flexibility and spikes. The approach is less capital-intensive than pure-owned neoclouds and gives Together strategic flexibility. The next chapter looks at the customer base.