The unreasonable effectiveness of few-shot learning for machine translation
Xavier Garcia, Yamini Bansal, Colin Cherry, George Foster, Maxim, Krikun, Fangxiaoyu Feng, Melvin Johnson, Orhan Firat

TL;DR
This paper demonstrates that a simple transformer model trained with self-supervised learning can achieve high-quality machine translation with only five examples, outperforming state-of-the-art systems without joint training or back-translation.
Contribution
It introduces a few-shot translation approach that requires minimal data and training complexity, showing competitive results and controllability in translation attributes.
Findings
Outperforms state-of-the-art on WMT'21 English-Chinese translation with only five examples.
Models are two orders of magnitude smaller than current large language models.
Performance heavily depends on the quality of few-shot demonstrations.
Abstract
We demonstrate the potential of few-shot translation systems, trained with unpaired language data, for both high and low-resource language pairs. We show that with only 5 examples of high-quality translation data shown at inference, a transformer decoder-only model trained solely with self-supervised learning, is able to match specialized supervised state-of-the-art models as well as more general commercial translation systems. In particular, we outperform the best performing system on the WMT'21 English - Chinese news translation task by only using five examples of English - Chinese parallel data at inference. Moreover, our approach in building these models does not necessitate joint multilingual training or back-translation, is conceptually simple and shows the potential to extend to the multilingual setting. Furthermore, the resulting models are two orders of magnitude smaller than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
