ChuXin: 1.6B Technical Report
Xiaomin Zhuang, Yufan Jiang, Qiaozhi He, Zhihua Wu

TL;DR
ChuXin is an open-source 1.6B language model with comprehensive training resources, extended context length, and strong retrieval capabilities, aimed at fostering transparency and innovation in language modeling research.
Contribution
We provide the complete training pipeline, data, and evaluation tools for ChuXin, promoting open research and transparency in large language model development.
Findings
Extended context length to 1 million tokens.
Achieved strong needle-in-a-haystack retrieval performance.
Made all training resources publicly available.
Abstract
In this report, we present ChuXin, an entirely open-source language model with a size of 1.6 billion parameters. Unlike the majority of works that only open-sourced the model weights and architecture, we have made everything needed to train a model available, including the training data, the training process, and the evaluation code. Our goal is to empower and strengthen the open research community, fostering transparency and enabling a new wave of innovation in the field of language modeling. Furthermore, we extend the context length to 1M tokens through lightweight continual pretraining and demonstrate strong needle-in-a-haystack retrieval performance. The weights for both models are available at Hugging Face to download and use.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors · Interconnection Networks and Systems · Parallel Computing and Optimization Techniques
