Tackling Ambiguity with Images: Improved Multimodal Machine Translation and Contrastive Evaluation
Matthieu Futeral, Cordelia Schmid, Ivan Laptev, Beno\^it Sagot, Rachel, Bawden

TL;DR
This paper introduces a new multimodal machine translation approach that leverages images and novel training techniques to better resolve ambiguity, achieving superior performance on a new contrastive evaluation dataset.
Contribution
The paper proposes a novel MMT model with neural adapters and guided self-attention, and introduces CoMMuTE, a dataset for evaluating disambiguation in multimodal translation.
Findings
Competitive results on standard benchmarks.
Significant improvements on the CoMMuTE contrastive test set.
Outperforms state-of-the-art MMT systems.
Abstract
One of the major challenges of machine translation (MT) is ambiguity, which can in some cases be resolved by accompanying context such as images. However, recent work in multimodal MT (MMT) has shown that obtaining improvements from images is challenging, limited not only by the difficulty of building effective cross-modal representations, but also by the lack of specific evaluation and training data. We present a new MMT approach based on a strong text-only MT model, which uses neural adapters, a novel guided self-attention mechanism and which is jointly trained on both visually-conditioned masking and MMT. We also introduce CoMMuTE, a Contrastive Multilingual Multimodal Translation Evaluation set of ambiguous sentences and their possible translations, accompanied by disambiguating images corresponding to each translation. Our approach obtains competitive results compared to strong…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Translation Studies and Practices · Multimodal Machine Learning Applications
MethodsTest
