Dynamic data sampler for cross-language transfer learning in large language models
Yudong Li, Yuhao Feng, Wen Zhou, Zhe Zhao, Linlin Shen, Cheng Hou,, Xianxu Hou

TL;DR
This paper introduces ChatFlow, a cross-language transfer learning approach with a dynamic data sampler to efficiently train large Chinese language models using mixed-language corpora, improving convergence and performance.
Contribution
The paper presents a novel dynamic data sampler and a transfer learning framework that effectively trains Chinese LLMs by leveraging multilingual data and progressive training strategies.
Findings
Accelerates model convergence compared to baseline methods.
Achieves superior performance on Chinese and English benchmarks.
Outperforms other Chinese models post-trained on LLaMA-2-7B.
Abstract
Large Language Models (LLMs) have gained significant attention in the field of natural language processing (NLP) due to their wide range of applications. However, training LLMs for languages other than English poses significant challenges, due to the difficulty in acquiring large-scale corpus and the requisite computing resources. In this paper, we propose ChatFlow, a cross-language transfer-based LLM, to address these challenges and train large Chinese language models in a cost-effective manner. We employ a mix of Chinese, English, and parallel corpus to continuously train the LLaMA2 model, aiming to align cross-language representations and facilitate the knowledge transfer specifically to the Chinese language model. In addition, we use a dynamic data sampler to progressively transition the model from unsupervised pre-training to supervised fine-tuning. Experimental results demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis
MethodsALIGN
