Zero-resource Machine Translation by Multimodal Encoder-decoder Network with Multimedia Pivot
Hideki Nakayama, Noriki Nishida

TL;DR
This paper introduces a zero-resource neural machine translation method that leverages multimodal embedded representations over texts and images, enabling translation without parallel corpora by using multimedia as a pivot.
Contribution
It presents a novel multimodal encoder-decoder network that uses multimedia as a pivot to perform translation without supervised parallel data, bridging different modalities in a shared semantic space.
Findings
Achieved reasonable translation performance on benchmark datasets.
End-to-end model with combined rank and cross-entropy loss performed best.
Multimodal representations effectively bridge language gaps without parallel corpora.
Abstract
We propose an approach to build a neural machine translation system with no supervised resources (i.e., no parallel corpora) using multimodal embedded representation over texts and images. Based on the assumption that text documents are often likely to be described with other multimedia information (e.g., images) somewhat related to the content, we try to indirectly estimate the relevance between two languages. Using multimedia as the "pivot", we project all modalities into one common hidden space where samples belonging to similar semantic concepts should come close to each other, whatever the observed space of each sample is. This modality-agnostic representation is the key to bridging the gap between different modalities. Putting a decoder on top of it, our network can flexibly draw the outputs from any input modality. Notably, in the testing phase, we need only source language texts…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
