MoE-CT: A Novel Approach For Large Language Models Training With   Resistance To Catastrophic Forgetting

Tianhao Li; Shangjie Li; Binbin Xie; Deyi Xiong; Baosong Yang

arXiv:2407.00875·cs.CL·July 2, 2024·2 cites

MoE-CT: A Novel Approach For Large Language Models Training With Resistance To Catastrophic Forgetting

Tianhao Li, Shangjie Li, Binbin Xie, Deyi Xiong, Baosong Yang

PDF

Open Access

TL;DR

This paper introduces MoE-CT, a novel training architecture for large language models that preserves high-resource language performance while enhancing low-resource language capabilities through a frozen base model and an appended MoE module.

Contribution

The paper presents a new MoE-CT architecture that separates base model training from multilingual expansion, improving low-resource language performance without degrading high-resource language proficiency.

Findings

01

Outperforms conventional continual training methods in multilingual benchmarks.

02

Demonstrates enhanced resistance to catastrophic forgetting.

03

Shows improved transfer learning capabilities.

Abstract

The advent of large language models (LLMs) has predominantly catered to high-resource languages, leaving a disparity in performance for low-resource languages. Conventional Continual Training (CT) approaches to bridge this gap often undermine a model's original linguistic proficiency when expanding to multilingual contexts. Addressing this issue, we introduce a novel MoE-CT architecture, a paradigm that innovatively separates the base model's learning from the multilingual expansion process. Our design freezes the original LLM parameters, thus safeguarding its performance in high-resource languages, while an appended MoE module, trained on diverse language datasets, augments low-resource language proficiency. Our approach significantly outperforms conventional CT methods, as evidenced by our experiments, which show marked improvements in multilingual benchmarks without sacrificing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsMixture of Experts · Balanced Selection