On Leveraging the Visual Modality for Neural Machine Translation

Vikas Raunak; Sang Keun Choe; Quanyang Lu; Yi Xu; Florian Metze

arXiv:1910.02754·cs.CL·October 8, 2019

On Leveraging the Visual Modality for Neural Machine Translation

Vikas Raunak, Sang Keun Choe, Quanyang Lu, Yi Xu, Florian Metze

PDF

TL;DR

This paper investigates the role of visual information in neural machine translation using a larger, more complex dataset, proposing new fusion methods but finding limited benefits due to the quality of visual embeddings.

Contribution

It introduces three novel fusion techniques for integrating visual context in NMT and analyzes the impact of visual embedding quality on translation performance.

Findings

01

Marginal gains from visual context in large-scale datasets

02

Visual embeddings' discriminativeness is insufficient for improved translation

03

Quality of visual embeddings is crucial for effective multimodal NMT

Abstract

Leveraging the visual modality effectively for Neural Machine Translation (NMT) remains an open problem in computational linguistics. Recently, Caglayan et al. posit that the observed gains are limited mainly due to the very simple, short, repetitive sentences of the Multi30k dataset (the only multimodal MT dataset available at the time), which renders the source text sufficient for context. In this work, we further investigate this hypothesis on a new large scale multimodal Machine Translation (MMT) dataset, How2, which has 1.57 times longer mean sentence length than Multi30k and no repetition. We propose and evaluate three novel fusion techniques, each of which is designed to ensure the utilization of visual context at different stages of the Sequence-to-Sequence transduction pipeline, even under full linguistic context. However, we still obtain only marginal gains under full…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAverage Pooling · ResNeXt Block · Grouped Convolution · Global Average Pooling · Residual Connection · *Communicated@Fast*How Do I Communicate to Expedia? · Kaiming Initialization · 1x1 Convolution · Convolution · Batch Normalization