Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM
Yang Liu, Xiaolong Zhong, Ling Jiang

TL;DR
Xmodel-2.5 is a 1.3-billion-parameter language model optimized for efficient reasoning tasks, utilizing innovative training techniques like maximal-update parameterization and mixed-precision to achieve high performance with reduced computational costs.
Contribution
The paper introduces Xmodel-2.5, a small, data-efficient reasoning language model with novel training strategies, including transferable hyper-parameters and hybrid optimizer switching, to enhance reasoning accuracy.
Findings
Switching optimizers improves reasoning performance by 4.58%.
Maximal-update parameterization enables hyper-parameter transferability.
FP8 mixed-precision balances accuracy and throughput.
Abstract
Large language models deliver strong reasoning and tool-use skills, yet their computational demands make them impractical for edge or cost-sensitive deployments. We present \textbf{Xmodel-2.5}, a 1.3-billion-parameter small language model designed as a \emph{drop-in agent core}. Training with maximal-update parameterization (P) allows hyper-parameters tuned on a 20M-parameter proxy to transfer directly to the full model, even under the parameter-tied \emph{tie-word-embedding} architecture. A 1.4T-token Warmup--Stable--Decay curriculum is used, and we further show that \textbf{switching from AdamW to Muon during the decay phase} improves the 13-task reasoning average by 4.58\,\% while keeping every other hyper-parameter fixed, verifying that early AdamW stability can be paired with late Muon sharpening for better downstream performance. FP8-mixed-precision training balances accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Materials Science · Machine Learning and Data Classification
