Section B · The inference service

The Inference Service

Hyperbolic's inference service offers per-token API access to curated open-source models. Smaller catalog and traffic than Together; competitive on price; growing.

Inference API

Standard OpenAI-API-compatible inference endpoints. Customers point their OpenAI client at Hyperbolic's endpoint and run.

Model catalog

Curated open-source models including:

  • Llama family.
  • DeepSeek variants.
  • Qwen.
  • Mixtral / Mistral.
  • Other selected open-source releases.

Catalog is smaller than Together's. The selection prioritizes the highest-traffic models.

Per-token pricing

Pricing is competitive with the broader open-source inference category. Often comparable to Together's or slightly below for specific models. The smaller scale means Hyperbolic has less serving optimization headroom but also lower cost overhead.

vs Together / Fireworks

  • Together and Fireworks are larger by traffic.
  • Hyperbolic's research credibility and dual-product story are unique.
  • Per-token pricing and quality are competitive on most overlapping models.

Position in inference category

Hyperbolic is a credible second-tier player in the managed-inference category. Specific use cases (cost-sensitive inference, ease of integrating with raw-GPU rental) bring customers; the broader category leader pull means Hyperbolic doesn't dominate.

Takeaway

The inference service is a competent product but doesn't lead its category. The next chapter examines the underlying infrastructure.