Multimodal Machine Translation with Visual Scene Graph Pruning
Chenyu Lu, Shiliang Sun, Jing Zhao, Nan Zhang, Tengfei Song, Hao Yang

TL;DR
This paper introduces a novel multimodal machine translation approach that uses visual scene graph pruning to reduce noise and redundancy in visual data, improving translation quality.
Contribution
It proposes a new method leveraging language scene graph information to prune visual scene graphs, addressing visual data redundancy in multimodal translation.
Findings
PSG outperforms state-of-the-art methods in experiments.
Visual scene graph pruning reduces noise in translation tasks.
Ablation studies confirm the effectiveness of pruning.
Abstract
Multimodal machine translation (MMT) seeks to address the challenges posed by linguistic polysemy and ambiguity in translation tasks by incorporating visual information. A key bottleneck in current MMT research is the effective utilization of visual data. Previous approaches have focused on extracting global or region-level image features and using attention or gating mechanisms for multimodal information fusion. However, these methods have not adequately tackled the issue of visual information redundancy in MMT, nor have they proposed effective solutions. In this paper, we introduce a novel approach--multimodal machine translation with visual Scene Graph Pruning (PSG), which leverages language scene graph information to guide the pruning of redundant nodes in visual scene graphs, thereby reducing noise in downstream translation tasks. Through extensive comparative experiments with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications · Topic Modeling
MethodsSoftmax · Attention Is All You Need · Pruning
