Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts

Chen Yang; Guangyue Peng; Jiaying Zhu; Ran Le; Ruixiang Feng; Tao Zhang; Xiyun Xu; Yang Song; Yiming Jia; Yuntao Wen; Yunzhi Xu; Zekai Wang; Zhenwei An; Zhicong Sun; Zongchao Chen

arXiv:2602.13367·cs.AI·February 17, 2026

Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts

Chen Yang, Guangyue Peng, Jiaying Zhu, Ran Le, Ruixiang Feng, Tao Zhang, Xiyun Xu, Yang Song, Yiming Jia, Yuntao Wen, Yunzhi Xu, Zekai Wang, Zhenwei An, Zhicong Sun, Zongchao Chen

PDF

Open Access 10 Models 3 Datasets

TL;DR

Nanbeige4.1-3B is a small, 3-billion-parameter generalist language model that excels in reasoning, code generation, and agentic behavior, outperforming larger models through innovative training techniques.

Contribution

The paper introduces Nanbeige4.1-3B, the first open-source small model to combine reasoning, code, and agentic capabilities in a single model with advanced training methods.

Findings

01

Outperforms similar-sized models like Nanbeige4-3B-2511 and Qwen3-4B.

02

Achieves superior performance compared to larger models such as Qwen3-30B.

03

Executes up to 600 tool-call turns reliably for complex tasks.

Abstract

We present Nanbeige4.1-3B, a unified generalist language model that simultaneously achieves strong agentic behavior, code generation, and general reasoning with only 3B parameters. To the best of our knowledge, it is the first open-source small language model (SLM) to achieve such versatility in a single model. To improve reasoning and preference alignment, we combine point-wise and pair-wise reward modeling, ensuring high-quality, human-aligned responses. For code generation, we design complexity-aware rewards in Reinforcement Learning, optimizing both correctness and efficiency. In deep search, we perform complex data synthesis and incorporate turn-level supervision during training. This enables stable long-horizon tool interactions, allowing Nanbeige4.1-3B to reliably execute up to 600 tool-call turns for complex problem-solving. Extensive experimental results show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics