# Probing the Need for Visual Context in Multimodal Machine Translation

**Authors:** Ozan Caglayan, Pranava Madhyastha, Lucia Specia, Lo\"ic Barrault

arXiv: 1903.08678 · 2019-06-04

## TL;DR

This paper investigates whether visual context enhances multimodal machine translation by showing that models can leverage visual information when textual context is limited, challenging previous assumptions about the modality's usefulness.

## Contribution

The study provides a systematic analysis demonstrating that visual input improves translation quality in low-textual-context scenarios, contradicting prior beliefs about the irrelevance of visual data.

## Key findings

- Visual modality benefits translation with limited textual context
- Models can effectively utilize visual information when source text is sparse
- Challenges the notion that visual features are ineffective in MMT

## Abstract

Current work on multimodal machine translation (MMT) has suggested that the visual modality is either unnecessary or only marginally beneficial. We posit that this is a consequence of the very simple, short and repetitive sentences used in the only available dataset for the task (Multi30K), rendering the source text sufficient as context. In the general case, however, we believe that it is possible to combine visual and textual information in order to ground translations. In this paper we probe the contribution of the visual modality to state-of-the-art MMT models by conducting a systematic analysis where we partially deprive the models from source-side textual context. Our results show that under limited textual context, models are capable of leveraging the visual input to generate better translations. This contradicts the current belief that MMT models disregard the visual modality because of either the quality of the image features or the way they are integrated into the model.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.08678/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/1903.08678/full.md

## References

33 references — full list in the complete paper: https://tomesphere.com/paper/1903.08678/full.md

---
Source: https://tomesphere.com/paper/1903.08678