Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought
Tencent Hunyuan Team: Ao Liu, Botong Zhou, Can Xu, Chayse Zhou, ChenChen Zhang, Chengcheng Xu, Chenhao Wang, Decheng Wu, Dengpeng Wu, Dian Jiao, Dong Du, Dong Wang, Feng Zhang, Fengzong Lian, Guanghui Xu, Guanwei Zhang, Hai Wang, Haipeng Luo, Han Hu, Huilin Xu, Jiajia Wu

TL;DR
Hunyuan-TurboS is a large hybrid Transformer-Mamba MoE model that combines efficiency and contextual understanding, featuring adaptive chain-of-thought mechanisms and extensive training to achieve top-tier performance with lower inference costs.
Contribution
This paper introduces Hunyuan-TurboS, the first industry-deployed large-scale Mamba model, integrating adaptive CoT and multi-stage reinforcement learning for improved efficiency and reasoning.
Findings
Top 7 rank on LMSYS Chatbot Arena
77.9% average score across 23 benchmarks
Outperforms leading models like Gemini-2.0-Flash-001
Abstract
As Large Language Models (LLMs) rapidly advance, we introduce Hunyuan-TurboS, a novel large hybrid Transformer-Mamba Mixture of Experts (MoE) model. It synergistically combines Mamba's long-sequence processing efficiency with Transformer's superior contextual understanding. Hunyuan-TurboS features an adaptive long-short chain-of-thought (CoT) mechanism, dynamically switching between rapid responses for simple queries and deep "thinking" modes for complex problems, optimizing computational resources. Architecturally, this 56B activated (560B total) parameter model employs 128 layers (Mamba2, Attention, FFN) with an innovative AMF/MF block pattern. Faster Mamba2 ensures linear complexity, Grouped-Query Attention minimizes KV cache, and FFNs use an MoE structure. Pre-trained on 16T high-quality tokens, it supports a 256K context length and is the first industry-deployed large-scale Mamba…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Big Data and Digital Economy · Artificial Intelligence in Healthcare and Education
MethodsAttention Is All You Need · Dense Connections · Mixture of Experts · Softmax · Feedforward Network · Grouped-query attention · Mamba: Linear-Time Sequence Modeling with Selective State Spaces
