Adaptively Aligned Image Captioning via Adaptive Attention Time
Lun Huang, Wenmin Wang, Yaxian Xia, Jie Chen

TL;DR
This paper introduces Adaptive Attention Time (AAT), a novel attention mechanism for image captioning that adaptively determines the number of attention steps per caption word, improving alignment flexibility and caption quality.
Contribution
The paper proposes a new deterministic, differentiable attention model that allows flexible source-target alignment in image captioning, surpassing traditional one-to-one attention methods.
Findings
AAT outperforms state-of-the-art methods on image captioning benchmarks.
AAT enables flexible, multi-region to multi-word alignments.
The model is deterministic and gradient-friendly.
Abstract
Recent neural models for image captioning usually employ an encoder-decoder framework with an attention mechanism. However, the attention mechanism in such a framework aligns one single (attended) image feature vector to one caption word, assuming one-to-one mapping from source image regions and target caption words, which is never possible. In this paper, we propose a novel attention model, namely Adaptive Attention Time (AAT), to align the source and the target adaptively for image captioning. AAT allows the framework to learn how many attention steps to take to output a caption word at each decoding step. With AAT, an image region can be mapped to an arbitrary number of caption words while a caption word can also attend to an arbitrary number of image regions. AAT is deterministic and differentiable, and doesn't introduce any noise to the parameter gradients. In this paper, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications
