Attention Beam: An Image Captioning Approach

Anubhav Shrimal; Tanmoy Chakraborty

arXiv:2011.01753·cs.CV·November 12, 2020

Attention Beam: An Image Captioning Approach

Anubhav Shrimal, Tanmoy Chakraborty

PDF

Open Access

TL;DR

This paper introduces an improved beam search heuristic applied to encoder-decoder models, significantly enhancing image captioning quality across multiple benchmark datasets.

Contribution

It proposes a novel beam search heuristic that improves caption quality in encoder-decoder image captioning models.

Findings

01

Better caption quality on Flickr8k, Flickr30k, MS COCO datasets

02

Outperforms existing methods in benchmark evaluations

03

Demonstrates effectiveness of heuristic beam search

Abstract

The aim of image captioning is to generate textual description of a given image. Though seemingly an easy task for humans, it is challenging for machines as it requires the ability to comprehend the image (computer vision) and consequently generate a human-like description for the image (natural language understanding). In recent times, encoder-decoder based architectures have achieved state-of-the-art results for image captioning. Here, we present a heuristic of beam search on top of the encoder-decoder based architecture that gives better quality captions on three benchmark datasets: Flickr8k, Flickr30k and MS COCO.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization