Xmodel-2 Technical Report

Wang Qun; Liu Yang; Lin Qingquan; Qu Zhijiu; Jiang Ling

arXiv:2412.19638·cs.AI·December 30, 2024

Xmodel-2 Technical Report

Wang Qun, Liu Yang, Lin Qingquan, Qu Zhijiu, Jiang Ling

PDF

Open Access 1 Repo 1 Models

TL;DR

Xmodel-2 is a large language model with 1.2 billion parameters, optimized for reasoning tasks, employing a unified hyperparameter approach and WSD scheduler, achieving state-of-the-art results efficiently.

Contribution

Introduces Xmodel-2, a reasoning-focused language model with a unified hyperparameter design and effective training strategies for improved performance.

Findings

01

Achieves state-of-the-art reasoning performance

02

Maintains low training costs

03

Demonstrates effective transfer of configurations across model scales

Abstract

Xmodel-2 is a 1.2-billion-parameter large language model designed specifically for reasoning tasks. Its architecture enables different model scales to share a unified set of hyperparameters, allowing for extensive experimentation on smaller models and seamless transfer of optimal configurations to larger models. To maximize training efficiency and stability, Xmodel-2 employs the WSD learning rate scheduler from MiniCPM. Pretrained on 1.5 trillion tokens from diverse sources, Xmodel-2 achieves state-of-the-art performance in complex reasoning and agent-based tasks, while maintaining low training costs. These results highlight the potential of efficient model design and training strategies in advancing reasoning capabilities. Model checkpoints and code are publicly available on GitHub at https://github.com/XiaoduoAILab/Xmodel-2

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xiaoduoailab/xmodel-2
noneOfficial

Models

🤗
XiaoduoAILab/Xmodel-2
model· 9 dl
9 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSimulation Techniques and Applications

MethodsSparse Evolutionary Training