Build what's next on the AI Native Cloud. Full-stack AI platform for inference, fine-tuning, and GPU clusters — powered by cutting-edge research.
⚡️ FlashAttention-4: up to 1.3× faster than cuDNN on NVIDIA Blackwell → Introducing Together AI's new look → 🔎 ATLAS: runtime-learning accelerators delivering up to 4x faster LLM inference → ⚡ Together GPU Clusters: self-service NVIDIA GPUs, now generally available → 📦 Batch Inference API: Process billions of tokens at 50% lower cost for most models → 🪛 Fine-Tuning Platform Upgrades: Larger Models, Longer Contexts → The full stack platform for production AI, powered by cutting-edge systems research. We design a full-stack AI platform powered by cutting edge system research — helping teams ship faster, scale reliably and achieve superior unit economics. Open and responsible development Everything works best when we help the open-source community work better together. Our wonder, curiosity, and hope drive us to find ways to make everyone’s lives better. We are optimizers, making the most with what we have and not taking more than what we need. We build everything with the purpose of benefiting society. Featured partners that help us scale Meet our leaders, researchers and engineers building the systems behind Together AI. Senior Director of People Ops SVP of Engineering Infrastructure VP OF Technical Program Management
Mentions (30d)
0
Reviews
0
Platforms
2
Sentiment
0%
0 positive
Industry
information technology & services
Employees
380
Funding Stage
Series B
Total Funding
$533.5M
Introducing Mamba-3 🐍 Inference speeds are more i
Introducing Mamba-3 🐍 Inference speeds are more important than ever, driven by the rise in agents and inference-heavy RL rollouts. Linear models are fast in FLOPs but memory-bound during decode. Mamba-3's MIMO (multi-input, multi-output) variant fixes this: swap the recurrence from vector outer-product to matrix multiply, and you get a stronger model at the same decode speed. Fastest prefill+decode at 1.5B. Beats Mamba-2, GDN, and Llama-3.2-1B. Kernels open-sourced. #mamba3 #togetherresearch Congratulations to the team leading this research: @aakash_lahoti @kevinyli_ @_berlinchen @caitWW9 @tri_dao @_albertgu
View originalPricing found: $0.30, $0.06, $1.20, $0.50, $2.80
See where you can catch us next: https://t.co/6X11GmtPI0
See where you can catch us next: https://t.co/6X11GmtPI0
View originalThat’s a wrap on HumanX. Custom comics, hats, a happy hour with @getmetronome & @nvidia, and two sessions on what actually matters for AI-native builders. #HumanX #TogetherAI #AINativeCloud https:
That’s a wrap on HumanX. Custom comics, hats, a happy hour with @getmetronome & @nvidia, and two sessions on what actually matters for AI-native builders. #HumanX #TogetherAI #AINativeCloud https://t.co/lOTNY8bhDp
View originalGemma 4 31B brings dense multimodal reasoning to Together AI. Try Now: https://t.co/Xx1rbOe7m4
Gemma 4 31B brings dense multimodal reasoning to Together AI. Try Now: https://t.co/Xx1rbOe7m4
View originalHighlights: 👉 Configurable thinking mode for step-by-step reasoning 👉 Multimodal understanding with text and image input, including document parsing and OCR 👉 Native function calling with structure
Highlights: 👉 Configurable thinking mode for step-by-step reasoning 👉 Multimodal understanding with text and image input, including document parsing and OCR 👉 Native function calling with structured tool use for agent workflows 👉 Production-ready on the AI Native Cloud—99.9% SLA, 256K context, and support for 140+ languages
View originalIntroducing Gemma 4 31B from @GoogleDeepMind on Together AI. AI natives can now use Gemma 4 31B on Together and benefit from reliable inference for multimodal reasoning, tool use, and agentic workfl
Introducing Gemma 4 31B from @GoogleDeepMind on Together AI. AI natives can now use Gemma 4 31B on Together and benefit from reliable inference for multimodal reasoning, tool use, and agentic workflows. https://t.co/g9oyqiG56C
View originalGLM-5.1 gives teams a stronger model for coding, tool use, and sustained agent performance on Together AI. Learn more: https://t.co/GJlBvGVRWC
GLM-5.1 gives teams a stronger model for coding, tool use, and sustained agent performance on Together AI. Learn more: https://t.co/GJlBvGVRWC
View originalHighlights: 👉 28% coding improvement over GLM-5 with refined RL post-training 👉 Better long-horizon execution across hundreds of rounds and thousands of tool calls 👉 Thinking mode, tool calling, an
Highlights: 👉 28% coding improvement over GLM-5 with refined RL post-training 👉 Better long-horizon execution across hundreds of rounds and thousands of tool calls 👉 Thinking mode, tool calling, and structured JSON output for agent pipelines 👉 Production-ready on the AI Native Cloud—99.9% SLA, serverless and dedicated options
View originalIntroducing GLM-5.1 from @Zai_org on Together AI. AI natives can now use GLM-5.1 on Together and benefit from reliable inference for production-scale agentic engineering and long-horizon coding work
Introducing GLM-5.1 from @Zai_org on Together AI. AI natives can now use GLM-5.1 on Together and benefit from reliable inference for production-scale agentic engineering and long-horizon coding workflows. https://t.co/8vSsGhciqg
View originalhttps://t.co/OK7Qf267hX
https://t.co/OK7Qf267hX
View originalBlog: https://t.co/6GF8qCUeV4 Paper: https://t.co/oRMiQAzAts Code: https://t.co/Vffm57gMIV
Blog: https://t.co/6GF8qCUeV4 Paper: https://t.co/oRMiQAzAts Code: https://t.co/Vffm57gMIV
View originalNew from Together Research: LLMs can fix query plans your database optimizer gets wrong. Up to 4.78x faster. Cost estimators fail when they miss semantic correlations: wrong join order, wrong access
New from Together Research: LLMs can fix query plans your database optimizer gets wrong. Up to 4.78x faster. Cost estimators fail when they miss semantic correlations: wrong join order, wrong access path, cascading errors. DBPlanBench feeds DataFusion's physical operator graph to an LLM, which patches the plan directly instead of regenerating it from scratch. On TPC-H / TPC-DS: → 4.78x peak speedup → 60.8% of queries improved >5% → Build memory: 3.3 GB → 411 MB Optimize on small-scale data, transfer to production.
View originalIntroducing Wan 2.7 from @alibaba_cloud on Together AI. AI natives can now build with Wan 2.7 on Together AI and get a clearer path from first-generation video to continuation, reference-driven contr
Introducing Wan 2.7 from @alibaba_cloud on Together AI. AI natives can now build with Wan 2.7 on Together AI and get a clearer path from first-generation video to continuation, reference-driven control, and editing on one production platform. https://t.co/BXJPCaiyWM
View originalWan 2.7 gives teams more control over how video is generated, continued, and refined on Together AI. Learn more: https://t.co/mBYATCvTRA
Wan 2.7 gives teams more control over how video is generated, continued, and refined on Together AI. Learn more: https://t.co/mBYATCvTRA
View originalHighlights: 👉 Text-to-video available now with 720P/1080P output, 2–15 second duration, and optional audio input 👉 More workflow control — continue scenes, steer outputs with references, and revise
Highlights: 👉 Text-to-video available now with 720P/1080P output, 2–15 second duration, and optional audio input 👉 More workflow control — continue scenes, steer outputs with references, and revise without restarting from scratch 👉 More of the suite coming soon — image-to-video, reference-to-video, and video edit 👉 Production-ready on the AI Native Cloud — 99.9% SLA, serverless inference, and enterprise deployment options
View originalHighlights: 👉 Flux: conversational STT with 250ms end-of-turn detection using conversational context 👉 Nova-3 + Nova-3 Multilingual: production transcription for noisy, real-world, and multilingual
Highlights: 👉 Flux: conversational STT with 250ms end-of-turn detection using conversational context 👉 Nova-3 + Nova-3 Multilingual: production transcription for noisy, real-world, and multilingual audio 👉 Aura-2: enterprise TTS with sub-200ms TTFB for clear, structured speech 👉 Production-ready on Together AI: dedicated infrastructure, 99.9% SLA, SOC 2 Type II
View originalYes, Together Inference offers a free tier. Pricing found: $0.30, $0.06, $1.20, $0.50, $2.80
Based on 55 social mentions analyzed, 0% of sentiment is positive, 100% neutral, and 0% negative.