Beyond Triplet: Leveraging the Most Data for Multimodal Machine Translation
Yaoming Zhu, Zewei Sun, Shanbo Cheng, Luyang Huang, Liwei Wu, Mingxuan, Wang

TL;DR
This paper introduces a new framework and dataset for multimodal machine translation that leverages large-scale non-triple data, improving translation quality in realistic scenarios and outperforming existing models.
Contribution
It proposes a 2/3-Triplet framework utilizing monolingual and parallel text data, and constructs the EMMT dataset for more practical evaluation of MMT systems.
Findings
Significant performance improvement with non-triple data.
Outperforms state-of-the-art models on benchmarks.
Better suitability for real-world applications.
Abstract
Multimodal machine translation (MMT) aims to improve translation quality by incorporating information from other modalities, such as vision. Previous MMT systems mainly focus on better access and use of visual information and tend to validate their methods on image-related datasets. These studies face two challenges. First, they can only utilize triple data (bilingual texts with images), which is scarce; second, current benchmarks are relatively restricted and do not correspond to realistic scenarios. Therefore, this paper correspondingly establishes new methods and new datasets for MMT. First, we propose a framework 2/3-Triplet with two new approaches to enhance MMT by utilizing large-scale non-triple data: monolingual image-text data and parallel text-only data. Second, we construct an English-Chinese {e}-commercial {m}ulti{m}odal {t}ranslation dataset (including training and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Translation Studies and Practices
MethodsTest
