Research Credibility
Together.AI's research output is unusual for an early-stage commercial company. FlashAttention, RedPajama, StripedHyena, speculative decoding work — the contributions add up to a credibility that translates directly into commercial advantage.
FlashAttention
Co-founder Tri Dao authored FlashAttention while at Stanford. It's one of the most-used pieces of modern transformer infrastructure:
- An optimized algorithm for computing attention that's both faster and uses less memory than standard implementations.
- Adopted in PyTorch, the major training frameworks, and most production inference serving stacks.
- FlashAttention 2 and 3 have followed with further improvements.
FlashAttention isn't a Together product per se; it's a research contribution from a Together co-founder. The strategic value is brand association — Together is "the FlashAttention company" in the technical community.
RedPajama
Released in 2023, RedPajama was an open-source reproduction of the LLaMA training dataset (the original Meta dataset was not released publicly). The release:
- Demonstrated the scale of dataset engineering required for frontier-quality models.
- Provided a reusable dataset for the broader community.
- Built Together's credibility as a contributor to the open-source ecosystem.
RedPajama-INCITE models (open-source models trained on the dataset) followed and were widely used in 2023.
StripedHyena and architecture work
Together's research has extended into architecture innovation:
- StripedHyena explored alternatives to standard attention mechanisms, with implications for long-context efficiency.
- Various contributions to hybrid architectures (Mamba-style state-space + attention combinations).
- Continued investigation of efficiency / quality tradeoffs.
The architecture research keeps Together at the front of the open-source-architecture discussion.
Serving and inference research
Practical serving research that drives the platform's economics:
- Speculative decoding (and the Sequoia algorithm) accelerates inference.
- Continuous batching and request scheduling research.
- Quantization techniques that maintain quality at reduced memory cost.
- Multi-LoRA serving — running many fine-tuned variants on a single base model efficiently.
These contributions are less academically visible but commercially direct. They make Together's per-GPU economics better than less-research-driven competitors.
Commercial value of research
How does the research credibility translate to revenue?
- Sales credibility. Enterprise customers take Together more seriously because of the founders' technical reputation.
- Talent recruiting. Top ML systems engineers want to work where they can ship research alongside product.
- Customer confidence in quality. The "they invented FlashAttention" framing reassures customers that the serving stack is competent.
- Partner relationships. Model providers (Mistral, Qwen, DeepSeek, etc.) prefer to partner with research-credible platforms.
- Brand differentiation. Together's research-first identity differentiates from purely-commercial competitors like Fireworks.
Talent moat
The research-friendly culture at Together attracts a specific kind of engineer — one who could be at OpenAI / Anthropic but prefers a smaller, more research-publication-friendly environment. The talent density is a real asset.
Competitors trying to match Together's serving efficiency need similar talent; they're competing for the same scarce pool.
Takeaway
Together's research output is a meaningful commercial asset. It builds brand, drives talent acquisition, and improves the platform's technical performance. The combination of research credibility and commercial execution distinguishes Together from competitors who focus on one or the other. The next chapter examines pricing — how that performance turns into commercial position.