Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters
Ailin Huang, Ang Li, Aobo Kong, Bin Wang, Binxing Jiao, Bo Dong, Bojun Wang, Boyu Chen, Brian Li, Buyun Ma, Chang Su, Changxin Miao, Changyi Wan, Chao Lou, Chen Hu, Chen Xu, Chenfeng Yu, Chengting Feng, Chengyuan Yao, Chunrui Han, Dan Ma, Dapeng Shi, Daxin Jiang, Dehua Ma

TL;DR
Step 3.5 Flash is a sparse Mixture-of-Experts model combining a large 196B-parameter foundation with 11B active parameters, optimized for efficient inference and agentic tasks, demonstrating strong performance across multiple benchmarks.
Contribution
It introduces a scalable reinforcement learning framework and optimization techniques for frontier-level sparse models, enabling efficient, reliable, and self-improving agentic intelligence.
Findings
Achieves high accuracy on multiple agentic and mathematical benchmarks.
Demonstrates stable large-scale off-policy reinforcement learning.
Redefines efficiency frontier for deploying sophisticated agents.
Abstract
We introduce Step 3.5 Flash, a sparse Mixture-of-Experts (MoE) model that bridges frontier-level agentic intelligence and computational efficiency. We focus on what matters most when building agents: sharp reasoning and fast, reliable execution. Step 3.5 Flash pairs a 196B-parameter foundation with 11B active parameters for efficient inference. It is optimized with interleaved 3:1 sliding-window/full attention and Multi-Token Prediction (MTP-3) to reduce the latency and cost of multi-round agentic interactions. To reach frontier-level intelligence, we design a scalable reinforcement learning framework that combines verifiable signals with preference feedback, while remaining stable under large-scale off-policy training, enabling consistent self-improvement across mathematics, code, and tool use. Step 3.5 Flash demonstrates strong performance across agent, coding, and math tasks,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗stepfun-ai/Step-3.5-Flashmodel· 93k dl· ♡ 75493k dl♡ 754
- 🤗stepfun-ai/Step-3.5-Flash-Base-Midtrainmodel· 242 dl· ♡ 39242 dl♡ 39
- 🤗stepfun-ai/Step-3.5-Flash-Basemodel· 802 dl· ♡ 81802 dl♡ 81
- 🤗stepfun-ai/Step-3.5-Flash-FP8model· 298k dl· ♡ 51298k dl♡ 51
- 🤗tacos4me/Step-3.5-Flash-NVFP4model· 1.3k dl· ♡ 101.3k dl♡ 10
- 🤗stepfun-ai/Step-3.5-Flash-GGUF-Q4_K_Smodel· 15k dl· ♡ 14015k dl♡ 140
- 🤗stepfun-ai/Step-3.5-Flash-GGUF-Q8_0model· 232 dl· ♡ 3232 dl♡ 3
- 🤗cyankiwi/Step-3.5-Flash-AWQ-4bitmodel· 209 dl209 dl
- 🤗Tawheeb123/Step-3.5-Flashmodel· 16 dl16 dl
- 🤗models123/Step-3.5-Flash-FP8model· 31 dl31 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Machine Learning and Algorithms
