Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts
Chen Yang, Guangyue Peng, Jiaying Zhu, Ran Le, Ruixiang Feng, Tao Zhang, Xiyun Xu, Yang Song, Yiming Jia, Yuntao Wen, Yunzhi Xu, Zekai Wang, Zhenwei An, Zhicong Sun, Zongchao Chen

TL;DR
Nanbeige4.1-3B is a small, 3-billion-parameter generalist language model that excels in reasoning, code generation, and agentic behavior, outperforming larger models through innovative training techniques.
Contribution
The paper introduces Nanbeige4.1-3B, the first open-source small model to combine reasoning, code, and agentic capabilities in a single model with advanced training methods.
Findings
Outperforms similar-sized models like Nanbeige4-3B-2511 and Qwen3-4B.
Achieves superior performance compared to larger models such as Qwen3-30B.
Executes up to 600 tool-call turns reliably for complex tasks.
Abstract
We present Nanbeige4.1-3B, a unified generalist language model that simultaneously achieves strong agentic behavior, code generation, and general reasoning with only 3B parameters. To the best of our knowledge, it is the first open-source small language model (SLM) to achieve such versatility in a single model. To improve reasoning and preference alignment, we combine point-wise and pair-wise reward modeling, ensuring high-quality, human-aligned responses. For code generation, we design complexity-aware rewards in Reinforcement Learning, optimizing both correctness and efficiency. In deep search, we perform complex data synthesis and incorporate turn-level supervision during training. This enables stable long-horizon tool interactions, allowing Nanbeige4.1-3B to reliably execute up to 600 tool-call turns for complex problem-solving. Extensive experimental results show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Nanbeige/Nanbeige4.1-3Bmodel· 573k dl· ♡ 1024573k dl♡ 1024
- 🤗heretic-org/Nanbeige4.1-3B-hereticmodel· 470 dl· ♡ 48470 dl♡ 48
- 🤗FastFlowLM/Nanbeige4.1-3B-NPU2model· 209 dl· ♡ 1209 dl♡ 1
- 🤗MuXodious/Nanbeige4.1-3B-PaperWitch-heresymodel· 13 dl· ♡ 413 dl♡ 4
- 🤗SimplySara/Nanbeige4.1-3B-i1-GGUFmodel· 178 dl· ♡ 1178 dl♡ 1
- 🤗SimplySara/Nanbeige4.1-3B-GGUFmodel· 196 dl· ♡ 1196 dl♡ 1
- 🤗futuregoldenbuddha/Nanbeige4.1-3B-EXL2-8.0bpwmodel· 12 dl· ♡ 112 dl♡ 1
- 🤗INDERJEET3233/Nanbeige4.1-3Bmodel
- 🤗void-818/Affine-star_v10-5Dy7KFivuHcFtLMM4PYnzkCgyAo7B3wRMft1CWur2jEzEmtQmodel· 12 dl12 dl
- 🤗raggiebo/Nanbeige4.1-3Bmodel· 8 dl8 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics
