TL;DR
This paper presents a method for improving machine translation of unseen words using few-shot learning and contextual data augmentation, enabling rapid adaptation with minimal examples, outperforming traditional systems trained on many more samples.
Contribution
It introduces a novel data augmentation approach with pre-trained language models for few-shot adaptation in machine translation, along with evaluation metrics and experimental setup.
Findings
Few-shot adaptation achieves higher accuracy than traditional models trained on many examples.
Combining data augmentation with random sentence selection yields the best BLEU scores.
Effective translation of unseen words is possible with only 1 to 5 examples.
Abstract
Machine translation (MT) models used in industries with constantly changing topics, such as translation or news agencies, need to adapt to new data to maintain their performance over time. Our aim is to teach a pre-trained MT model to translate previously unseen words accurately, based on very few examples. We propose (i) an experimental setup allowing us to simulate novel vocabulary appearing in human-submitted translations, and (ii) corresponding evaluation metrics to compare our approaches. We extend a data augmentation approach using a pre-trained language model to create training examples with similar contexts for novel words. We compare different fine-tuning and data augmentation approaches and show that adaptation on the scale of one to five examples is possible. Combining data augmentation with randomly selected training sentences leads to the highest BLEU score and accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
