Baichuan 2: Open Large-scale Language Models
Aiyuan Yang, Bin Xiao, Bingning Wang, Borong Zhang, Ce Bian, Chao Yin,, Chenxu Lv, Da Pan, Dian Wang, Dong Yan, Fan Yang, Fei Deng, Feng Wang, Feng, Liu, Guangwei Ai, Guosheng Dong, Haizhou Zhao, Hang Xu, Haoze Sun, Hongda, Zhang, Hui Liu, Jiaming Ji, Jian Xie, JunTao Dai

TL;DR
Baichuan 2 is a series of large-scale multilingual language models with 7B and 13B parameters, trained on 2.6 trillion tokens, outperforming similar open-source models on various benchmarks and excelling in specialized domains.
Contribution
This paper introduces Baichuan 2, a new open-source multilingual LLM trained from scratch with 2.6 trillion tokens, demonstrating superior performance on multiple benchmarks and domains.
Findings
Baichuan 2 matches or exceeds other open-source models on benchmarks.
Baichuan 2 performs well in specialized fields like medicine and law.
All pre-training checkpoints will be released for research use.
Abstract
Large language models (LLMs) have demonstrated remarkable performance on a variety of natural language tasks based on just a few examples of natural language instructions, reducing the need for extensive feature engineering. However, most powerful LLMs are closed-source or limited in their capability for languages other than English. In this technical report, we present Baichuan 2, a series of large-scale multilingual language models containing 7 billion and 13 billion parameters, trained from scratch, on 2.6 trillion tokens. Baichuan 2 matches or outperforms other open-source models of similar size on public benchmarks like MMLU, CMMLU, GSM8K, and HumanEval. Furthermore, Baichuan 2 excels in vertical domains such as medicine and law. We will release all pre-training model checkpoints to benefit the research community in better understanding the training dynamics of Baichuan 2.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsSoftmax · Attention Is All You Need
