OSU Multimodal Machine Translation System Report
Mingbo Ma, Dapeng Li, Kai Zhao, Liang Huang

TL;DR
This paper presents OSU's multimodal machine translation system that leverages shared images to improve translation quality for image caption datasets, achieving top TER results in English-German translation on MSCOCO.
Contribution
Introduces a simple multimodal translation system using shared images for encoding and decoding, enhancing translation performance on caption datasets.
Findings
Achieved best TER score for English-German on MSCOCO
System performs effectively on in-domain and out-of-domain datasets
Utilizes shared images to improve translation accuracy
Abstract
This paper describes Oregon State University's submissions to the shared WMT'17 task "multimodal translation task I". In this task, all the sentence pairs are image captions in different languages. The key difference between this task and conventional machine translation is that we have corresponding images as additional information for each sentence pair. In this paper, we introduce a simple but effective system which takes an image shared between different languages, feeding it into the both encoding and decoding side. We report our system's performance for English-French and English-German with Flickr30K (in-domain) and MSCOCO (out-of-domain) datasets. Our system achieves the best performance in TER for English-German for MSCOCO dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
