Unsupervised Multimodal Neural Machine Translation with Pseudo Visual Pivoting
Po-Yao Huang, Junjie Hu, Xiaojun Chang, Alexander Hauptmann

TL;DR
This paper introduces an unsupervised multimodal neural machine translation model that leverages visual content and pseudo visual pivoting to improve language translation accuracy without requiring images during testing.
Contribution
It proposes a novel approach combining multimodal back-translation and pseudo visual pivoting to enhance unsupervised multimodal translation in a shared visual-semantic space.
Findings
Significant improvement over state-of-the-art methods on Multi30K dataset.
Effective even when images are not available during testing.
Demonstrates the potential of visual content in unsupervised translation.
Abstract
Unsupervised machine translation (MT) has recently achieved impressive results with monolingual corpora only. However, it is still challenging to associate source-target sentences in the latent space. As people speak different languages biologically share similar visual systems, the potential of achieving better alignment through visual content is promising yet under-explored in unsupervised multimodal MT (MMT). In this paper, we investigate how to utilize visual content for disambiguation and promoting latent space alignment in unsupervised MMT. Our model employs multimodal back-translation and features pseudo visual pivoting in which we learn a shared multilingual visual-semantic embedding space and incorporate visually-pivoted captioning as additional weak supervision. The experimental results on the widely used Multi30K dataset show that the proposed model significantly improves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
