TL;DR
This paper introduces a novel Dynamic Context-guided Capsule Network that adaptively models visual features at different granularities to improve multimodal machine translation, outperforming existing methods.
Contribution
The paper proposes a dynamic routing mechanism within capsule networks to better utilize visual context in multimodal translation, addressing limitations of fixed context models.
Findings
DCCN outperforms baseline models on Multi30K dataset
The model effectively integrates global and regional visual features
Experimental results show improved translation quality
Abstract
Multimodal machine translation (MMT), which mainly focuses on enhancing text-only translation with visual features, has attracted considerable attention from both computer vision and natural language processing communities. Most current MMT models resort to attention mechanism, global context modeling or multimodal joint representation learning to utilize visual features. However, the attention mechanism lacks sufficient semantic interactions between modalities while the other two provide fixed visual context, which is unsuitable for modeling the observed variability when generating translation. To address the above issues, in this paper, we propose a novel Dynamic Context-guided Capsule Network (DCCN) for MMT. Specifically, at each timestep of decoding, we first employ the conventional source-target attention to produce a timestep-specific source-side context vector. Next, DCCN takes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsCapsule Network
