Context-Aware Visual Policy Network for Sequence-Level Image Captioning

Daqing Liu; Zheng-Jun Zha; Hanwang Zhang; Yongdong Zhang; Feng Wu

arXiv:1808.05864·cs.CV·August 23, 2018

Context-Aware Visual Policy Network for Sequence-Level Image Captioning

Daqing Liu, Zheng-Jun Zha, Hanwang Zhang, Yongdong Zhang, Feng Wu

PDF

1 Repo

TL;DR

This paper introduces a context-aware visual policy network for image captioning that explicitly models visual context over time, improving the ability to generate more accurate and contextually rich captions.

Contribution

It proposes a novel visual policy network that considers previous visual attention as context, enhancing sequence-level image captioning beyond traditional attention mechanisms.

Findings

01

Achieves state-of-the-art results on MS-COCO dataset.

02

Effectively models complex visual compositions over time.

03

Improves caption quality by incorporating visual context.

Abstract

Many vision-language tasks can be reduced to the problem of sequence prediction for natural language output. In particular, recent advances in image captioning use deep reinforcement learning (RL) to alleviate the "exposure bias" during training: ground-truth subsequence is exposed in every step prediction, which introduces bias in test when only predicted subsequence is seen. However, existing RL-based image captioning methods only focus on the language policy while not the visual policy (e.g., visual attention), and thus fail to capture the visual context that are crucial for compositional reasoning such as visual relationships (e.g., "man riding horse") and comparisons (e.g., "smaller cat"). To fill the gap, we propose a Context-Aware Visual Policy network (CAVP) for sequence-level image captioning. At every time step, CAVP explicitly accounts for the previous visual attentions as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

daqingliu/CAVP
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.