Crossing the Threshold: Idiomatic Machine Translation through Retrieval Augmentation and Loss Weighting
Emmy Liu, Aditi Chaudhary, Graham Neubig

TL;DR
This paper investigates how to improve machine translation of idiomatic expressions by characterizing idiomatic translation, conducting synthetic experiments, and proposing retrieval augmentation and loss weighting techniques, resulting in significant accuracy gains.
Contribution
It introduces a new dataset of idiomatic sentences and demonstrates effective methods for enhancing idiomatic translation in transformer-based models.
Findings
Up to 13% absolute accuracy improvement on idiomatic sentences
Identification of a tipping point for correct idiomatic translation
Potential benefits for non-idiomatic sentence translation
Abstract
Idioms are common in everyday language, but often pose a challenge to translators because their meanings do not follow from the meanings of their parts. Despite significant advances, machine translation systems still struggle to translate idiomatic expressions. We provide a simple characterization of idiomatic translation and related issues. This allows us to conduct a synthetic experiment revealing a tipping point at which transformer-based machine translation models correctly default to idiomatic translations. To expand multilingual resources, we compile a dataset of ~4k natural sentences containing idiomatic expressions in French, Finnish, and Japanese. To improve translation of natural idioms, we introduce two straightforward yet effective techniques: the strategic upweighting of training loss on potentially idiomatic sentences, and using retrieval-augmented models. This not only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Translation Studies and Practices
