ChuXin: 1.6B Technical Report

Xiaomin Zhuang; Yufan Jiang; Qiaozhi He; Zhihua Wu

arXiv:2405.04828·cs.CL·May 9, 2024

ChuXin: 1.6B Technical Report

Xiaomin Zhuang, Yufan Jiang, Qiaozhi He, Zhihua Wu

PDF

Open Access 2 Models

TL;DR

ChuXin is an open-source 1.6B language model with comprehensive training resources, extended context length, and strong retrieval capabilities, aimed at fostering transparency and innovation in language modeling research.

Contribution

We provide the complete training pipeline, data, and evaluation tools for ChuXin, promoting open research and transparency in large language model development.

Findings

01

Extended context length to 1 million tokens.

02

Achieved strong needle-in-a-haystack retrieval performance.

03

Made all training resources publicly available.

Abstract

In this report, we present ChuXin, an entirely open-source language model with a size of 1.6 billion parameters. Unlike the majority of works that only open-sourced the model weights and architecture, we have made everything needed to train a model available, including the training data, the training process, and the evaluation code. Our goal is to empower and strengthen the open research community, fostering transparency and enabling a new wave of innovation in the field of language modeling. Furthermore, we extend the context length to 1M tokens through lightweight continual pretraining and demonstrate strong needle-in-a-haystack retrieval performance. The weights for both models are available at Hugging Face to download and use.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCCD and CMOS Imaging Sensors · Interconnection Networks and Systems · Parallel Computing and Optimization Techniques