A Novel Actor Dual-Critic Model for Remote Sensing Image Captioning

Ruchika Chavhan; Biplab Banerjee; Xiao Xiang Zhu; and Subhasis; Chaudhuri

arXiv:2010.01999·cs.CV·October 6, 2020

A Novel Actor Dual-Critic Model for Remote Sensing Image Captioning

Ruchika Chavhan, Biplab Banerjee, Xiao Xiang Zhu, and Subhasis, Chaudhuri

PDF

TL;DR

This paper introduces a novel actor dual-critic deep reinforcement learning model for remote sensing image captioning, improving caption accuracy by encoding semantic information and high-level image comprehension.

Contribution

It proposes an Actor Dual-Critic training strategy with an encoder-decoder RNN to enhance semantic understanding and caption quality in remote sensing image captioning.

Findings

01

Outperforms previous state-of-the-art on RSICD and UCM-captions datasets.

02

Achieves significant improvements in ROUGE-L and CIDEr scores.

03

Generates captions that are highly similar or better than ground truth in critical cases.

Abstract

We deal with the problem of generating textual captions from optical remote sensing (RS) images using the notion of deep reinforcement learning. Due to the high inter-class similarity in reference sentences describing remote sensing data, jointly encoding the sentences and images encourages prediction of captions that are semantically more precise than the ground truth in many cases. To this end, we introduce an Actor Dual-Critic training strategy where a second critic model is deployed in the form of an encoder-decoder RNN to encode the latent information corresponding to the original and generated captions. While all actor-critic methods use an actor to predict sentences for an image and a critic to provide rewards, our proposed encoder-decoder RNN guarantees high-level comprehension of images by sentence-to-image translation. We observe that the proposed model generates sentences on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.