Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM
Qingshui Gu, Shu Li, Tianyu Zheng, Zhaoxiang Zhang

TL;DR
Steel-LLM is a Chinese-centric, open-source 1-billion-parameter language model developed from scratch with practical insights, demonstrating competitive benchmark performance despite limited resources.
Contribution
The paper details the end-to-end development of Steel-LLM, including data collection, model design, training strategies, and challenges, providing a valuable resource for open-source LLM development.
Findings
Steel-LLM outperforms early models on CEVAL and CMMLU benchmarks.
The model demonstrates competitive performance with limited computational resources.
Open-source training scripts and checkpoints are publicly available.
Abstract
Steel-LLM is a Chinese-centric language model developed from scratch with the goal of creating a high-quality, open-source model despite limited computational resources. Launched in March 2024, the project aimed to train a 1-billion-parameter model on a large-scale dataset, prioritizing transparency and the sharing of practical insights to assist others in the community. The training process primarily focused on Chinese data, with a small proportion of English data included, addressing gaps in existing open-source LLMs by providing a more detailed and practical account of the model-building journey. Steel-LLM has demonstrated competitive performance on benchmarks such as CEVAL and CMMLU, outperforming early models from larger institutions. This paper provides a comprehensive summary of the project's key contributions, including data collection, model design, training methodologies, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLibrary Science and Information Systems
