MoE-CT: A Novel Approach For Large Language Models Training With Resistance To Catastrophic Forgetting
Tianhao Li, Shangjie Li, Binbin Xie, Deyi Xiong, Baosong Yang

TL;DR
This paper introduces MoE-CT, a novel training architecture for large language models that preserves high-resource language performance while enhancing low-resource language capabilities through a frozen base model and an appended MoE module.
Contribution
The paper presents a new MoE-CT architecture that separates base model training from multilingual expansion, improving low-resource language performance without degrading high-resource language proficiency.
Findings
Outperforms conventional continual training methods in multilingual benchmarks.
Demonstrates enhanced resistance to catastrophic forgetting.
Shows improved transfer learning capabilities.
Abstract
The advent of large language models (LLMs) has predominantly catered to high-resource languages, leaving a disparity in performance for low-resource languages. Conventional Continual Training (CT) approaches to bridge this gap often undermine a model's original linguistic proficiency when expanding to multilingual contexts. Addressing this issue, we introduce a novel MoE-CT architecture, a paradigm that innovatively separates the base model's learning from the multilingual expansion process. Our design freezes the original LLM parameters, thus safeguarding its performance in high-resource languages, while an appended MoE module, trained on diverse language datasets, augments low-resource language proficiency. Our approach significantly outperforms conventional CT methods, as evidenced by our experiments, which show marked improvements in multilingual benchmarks without sacrificing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsMixture of Experts · Balanced Selection
