Guiding Attention using Partial-Order Relationships for Image Captioning
Murad Popattia, Muhammad Rafi, Rizwan Qureshi, Shah Nawaz

TL;DR
This paper introduces a guided attention network for image captioning that leverages partial-order relationships between visual features, topics, and captions in a shared embedding space to improve caption accuracy.
Contribution
It proposes a novel guided attention mechanism using a partial-order embedding space trained with a pairwise ranking objective for better image captioning.
Findings
Achieves competitive results on MSCOCO dataset
Outperforms several state-of-the-art models on multiple metrics
Demonstrates the effectiveness of partial-order relationships in attention models
Abstract
The use of attention models for automated image captioning has enabled many systems to produce accurate and meaningful descriptions for images. Over the years, many novel approaches have been proposed to enhance the attention process using different feature representations. In this paper, we extend this approach by creating a guided attention network mechanism, that exploits the relationship between the visual scene and text-descriptions using spatial features from the image, high-level information from the topics, and temporal context from caption generation, which are embedded together in an ordered embedding space. A pairwise ranking objective is used for training this embedding space which allows similar images, topics and captions in the shared semantic space to maintain a partial order in the visual-semantic hierarchy and hence, helps the model to produce more visually accurate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
