VALHALLA: Visual Hallucination for Machine Translation

Yi Li; Rameswar Panda; Yoon Kim; Chun-Fu Chen; Rogerio Feris; David; Cox; Nuno Vasconcelos

arXiv:2206.00100·cs.CV·June 2, 2022

VALHALLA: Visual Hallucination for Machine Translation

Yi Li, Rameswar Panda, Yoon Kim, Chun-Fu Chen, Rogerio Feris, David, Cox, Nuno Vasconcelos

PDF

Open Access 1 Repo

TL;DR

VALHALLA introduces a novel multimodal machine translation approach that hallucines visual representations from source text, enabling effective translation without requiring paired images during inference, thus broadening real-world applicability.

Contribution

The paper proposes a visual hallucination framework for machine translation that predicts visual features from text, eliminating the need for paired images at inference time.

Findings

01

Outperforms text-only baselines on multiple datasets

02

Achieves competitive results compared to multimodal methods with paired images

03

Demonstrates robustness across diverse language pairs

Abstract

Designing better machine translation systems by considering auxiliary inputs such as images has attracted much attention in recent years. While existing methods show promising performance over the conventional text-only translation systems, they typically require paired text and image as input during inference, which limits their applicability to real-world scenarios. In this paper, we introduce a visual hallucination framework, called VALHALLA, which requires only source sentences at inference time and instead uses hallucinated visual representations for multimodal machine translation. In particular, given a source sentence an autoregressive hallucination transformer is used to predict a discrete visual representation from the input text, and the combined text and hallucinated representations are utilized to obtain the target translation. We train the hallucination transformer jointly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jerryyli/valhalla-nmt
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCell Image Analysis Techniques