Conditions for Catastrophic Forgetting in Multilingual Translation
Danni Liu, Jan Niehues

TL;DR
This paper systematically investigates the conditions leading to catastrophic forgetting in multilingual translation models, highlighting the importance of data scale, instruction-following ability, and cross-lingual alignment in mitigating forgetting.
Contribution
It provides a comprehensive empirical analysis identifying key factors influencing catastrophic forgetting and evaluates various fine-tuning strategies in multilingual models.
Findings
Model-data scale ratio is a primary determinant of forgetting.
Instruction-following ability is crucial for retaining multilingual knowledge.
Cross-lingual alignment helps mitigate forgetting and enables positive transfer.
Abstract
Fine-tuning multilingual foundation models on specific languages often induces catastrophic forgetting, degrading performance on languages unseen in fine-tuning. While this phenomenon is widely-documented, the literature presents fragmented results about when forgetting occurs. To address this ambiguity, we conduct a systematic empirical study using machine translation as a testbed to identify the conditions that trigger catastrophic forgetting in multilingual fine-tuning. Through controlled experiments across different model architectures, data scales, and fine-tuning approaches, we reveal that the relative scale between model and data size is a primary determinant of forgetting. Moreover, we demonstrate that a model's instruction-following ability is more critical for retaining multilingual knowledge than its architecture. Contrary to assumptions, parameter-efficient fine-tuning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
