The Inference Service
Hyperbolic's inference service offers per-token API access to curated open-source models. Smaller catalog and traffic than Together; competitive on price; growing.
Inference API
Standard OpenAI-API-compatible inference endpoints. Customers point their OpenAI client at Hyperbolic's endpoint and run.
Model catalog
Curated open-source models including:
- Llama family.
- DeepSeek variants.
- Qwen.
- Mixtral / Mistral.
- Other selected open-source releases.
Catalog is smaller than Together's. The selection prioritizes the highest-traffic models.
Per-token pricing
Pricing is competitive with the broader open-source inference category. Often comparable to Together's or slightly below for specific models. The smaller scale means Hyperbolic has less serving optimization headroom but also lower cost overhead.
vs Together / Fireworks
- Together and Fireworks are larger by traffic.
- Hyperbolic's research credibility and dual-product story are unique.
- Per-token pricing and quality are competitive on most overlapping models.
Position in inference category
Hyperbolic is a credible second-tier player in the managed-inference category. Specific use cases (cost-sensitive inference, ease of integrating with raw-GPU rental) bring customers; the broader category leader pull means Hyperbolic doesn't dominate.
Takeaway
The inference service is a competent product but doesn't lead its category. The next chapter examines the underlying infrastructure.