TL;DR
ZmBART is an unsupervised cross-lingual transfer framework that enables natural language generation in low-resource languages without using parallel data, leveraging monolingual pre-training and task-specific fine-tuning.
Contribution
This work introduces ZmBART, a novel unsupervised transfer method for NLG that does not rely on parallel data and effectively transfers from high-resource to low-resource languages.
Findings
Effective zero-shot transfer to low-resource languages.
Improved performance with few-shot training.
Robustness demonstrated through ablations and analyses.
Abstract
Despite the recent advancement in NLP research, cross-lingual transfer for natural language generation is relatively understudied. In this work, we transfer supervision from high resource language (HRL) to multiple low-resource languages (LRLs) for natural language generation (NLG). We consider four NLG tasks (text summarization, question generation, news headline generation, and distractor generation) and three syntactically diverse languages, i.e., English, Hindi, and Japanese. We propose an unsupervised cross-lingual language generation framework (called ZmBART) that does not use any parallel or pseudo-parallel/back-translated data. In this framework, we further pre-train mBART sequence-to-sequence denoising auto-encoder model with an auxiliary task using monolingual data of three languages. The objective function of the auxiliary task is close to the target tasks which enriches the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsmBART
