Neural Machine Translation with Phrase-Level Universal Visual   Representations

Qingkai Fang; Yang Feng

arXiv:2203.10299·cs.CL·March 22, 2022

Neural Machine Translation with Phrase-Level Universal Visual Representations

Qingkai Fang, Yang Feng

PDF

Open Access 1 Repo

TL;DR

This paper introduces a phrase-level retrieval approach for multimodal machine translation that leverages existing sentence-image datasets to enhance translation quality without requiring paired data, using visual representations filtered by a variational auto-encoder.

Contribution

It proposes a novel phrase-level retrieval method combined with a conditional variational auto-encoder to improve multimodal translation by mitigating data sparsity and filtering redundant visual info.

Findings

01

Significant performance improvements over baselines on multiple datasets

02

Effective handling of limited textual context

03

Mitigation of data sparsity issues in multimodal translation

Abstract

Multimodal machine translation (MMT) aims to improve neural machine translation (NMT) with additional visual information, but most existing MMT methods require paired input of source sentence and image, which makes them suffer from shortage of sentence-image pairs. In this paper, we propose a phrase-level retrieval-based method for MMT to get visual information for the source input from existing sentence-image data sets so that MMT can break the limitation of paired sentence-image input. Our method performs retrieval at the phrase level and hence learns visual information from pairs of source phrase and grounded region, which can mitigate data sparsity. Furthermore, our method employs the conditional variational auto-encoder to learn visual representations which can filter redundant visual information and only retain visual information related to the phrase. Experiments show that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ictnlp/pluvr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications · Topic Modeling