Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM

Yang Liu; Xiaolong Zhong; Ling Jiang

arXiv:2511.19496·cs.LG·November 26, 2025

Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM

Yang Liu, Xiaolong Zhong, Ling Jiang

PDF

Open Access 2 Models

TL;DR

Xmodel-2.5 is a 1.3-billion-parameter language model optimized for efficient reasoning tasks, utilizing innovative training techniques like maximal-update parameterization and mixed-precision to achieve high performance with reduced computational costs.

Contribution

The paper introduces Xmodel-2.5, a small, data-efficient reasoning language model with novel training strategies, including transferable hyper-parameters and hybrid optimizer switching, to enhance reasoning accuracy.

Findings

01

Switching optimizers improves reasoning performance by 4.58%.

02

Maximal-update parameterization enables hyper-parameter transferability.

03

FP8 mixed-precision balances accuracy and throughput.

Abstract

Large language models deliver strong reasoning and tool-use skills, yet their computational demands make them impractical for edge or cost-sensitive deployments. We present \textbf{Xmodel-2.5}, a 1.3-billion-parameter small language model designed as a \emph{drop-in agent core}. Training with maximal-update parameterization ( $μ$ P) allows hyper-parameters tuned on a 20M-parameter proxy to transfer directly to the full model, even under the parameter-tied \emph{tie-word-embedding} architecture. A 1.4T-token Warmup--Stable--Decay curriculum is used, and we further show that \textbf{switching from AdamW to Muon during the decay phase} improves the 13-task reasoning average by 4.58\,\% while keeping every other hyper-parameter fixed, verifying that early AdamW stability can be paired with late Muon sharpening for better downstream performance. FP8-mixed-precision training balances accuracy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Materials Science · Machine Learning and Data Classification