Overcoming Catastrophic Forgetting in Zero-Shot Cross-Lingual Generation
Tu Vu, Aditya Barua, Brian Lester, Daniel Cer, Mohit Iyyer, Noah, Constant

TL;DR
This paper investigates how prompt tuning can mitigate catastrophic forgetting in zero-shot cross-lingual summarization, demonstrating improvements over standard fine-tuning but highlighting remaining challenges compared to supervised methods.
Contribution
It introduces the first study of prompt tuning for zero-shot cross-lingual generation and proposes methods to enhance transfer without parallel data.
Findings
Prompt tuning outperforms standard fine-tuning for less-related languages.
Mixing multilingual data improves zero-shot transfer quality.
Explicit prompt factorization further enhances cross-lingual performance.
Abstract
In this paper, we explore the challenging problem of performing a generative task in a target language when labeled data is only available in English, using summarization as a case study. We assume a strict setting with no access to parallel data or machine translation and find that common transfer learning approaches struggle in this setting, as a generative multilingual model fine-tuned purely on English catastrophically forgets how to generate non-English. Given the recent rise of parameter-efficient adaptation techniques, we conduct the first investigation into how one such method, prompt tuning (Lester et al., 2021), can overcome catastrophic forgetting to enable zero-shot cross-lingual generation. Our experiments show that parameter-efficient prompt tuning provides gains over standard fine-tuning when transferring between less-related languages, e.g., from English to Thai.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
