Attention Beam: An Image Captioning Approach
Anubhav Shrimal, Tanmoy Chakraborty

TL;DR
This paper introduces an improved beam search heuristic applied to encoder-decoder models, significantly enhancing image captioning quality across multiple benchmark datasets.
Contribution
It proposes a novel beam search heuristic that improves caption quality in encoder-decoder image captioning models.
Findings
Better caption quality on Flickr8k, Flickr30k, MS COCO datasets
Outperforms existing methods in benchmark evaluations
Demonstrates effectiveness of heuristic beam search
Abstract
The aim of image captioning is to generate textual description of a given image. Though seemingly an easy task for humans, it is challenging for machines as it requires the ability to comprehend the image (computer vision) and consequently generate a human-like description for the image (natural language understanding). In recent times, encoder-decoder based architectures have achieved state-of-the-art results for image captioning. Here, we present a heuristic of beam search on top of the encoder-decoder based architecture that gives better quality captions on three benchmark datasets: Flickr8k, Flickr30k and MS COCO.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
