Image Captioning with Context-Aware Auxiliary Guidance

Zeliang Song; Xiaofei Zhou; Zhendong Mao; Jianlong Tan

arXiv:2012.05545·cs.CV·January 5, 2021·5 cites

Image Captioning with Context-Aware Auxiliary Guidance

Zeliang Song, Xiaofei Zhou, Zhendong Mao, Jianlong Tan

PDF

Open Access 1 Video

TL;DR

This paper introduces a Context-Aware Auxiliary Guidance mechanism for image captioning, enhancing the model's ability to utilize global context and improve caption quality.

Contribution

It proposes a novel CAAG mechanism that guides captioning models to better perceive global context using semantic attention, applicable to various captioning architectures.

Findings

01

Achieved 132.2 CIDEr-D score on Microsoft COCO benchmark

02

Demonstrated improved performance across three captioning models

03

Validated effectiveness through competitive results on standard datasets

Abstract

Image captioning is a challenging computer vision task, which aims to generate a natural language description of an image. Most recent researches follow the encoder-decoder framework which depends heavily on the previous generated words for the current prediction. Such methods can not effectively take advantage of the future predicted information to learn complete semantics. In this paper, we propose Context-Aware Auxiliary Guidance (CAAG) mechanism that can guide the captioning model to perceive global contexts. Upon the captioning model, CAAG performs semantic attention that selectively concentrates on useful information of the global predictions to reproduce the current generation. To validate the adaptability of the method, we apply CAAG to three popular captioners and our proposal achieves competitive performance on the challenging Microsoft COCO image captioning benchmark, e.g.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Image Captioning with Context-Aware Auxiliary Guidance· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques