Deep Reinforcement Learning-based Image Captioning with Embedding Reward

Zhou Ren; Xiaoyu Wang; Ning Zhang; Xutao Lv; Li-Jia Li

arXiv:1704.03899·cs.CV·April 14, 2017·64 cites

Deep Reinforcement Learning-based Image Captioning with Embedding Reward

Zhou Ren, Xiaoyu Wang, Ning Zhang, Xutao Lv, Li-Jia Li

PDF

Open Access 1 Video

TL;DR

This paper proposes a novel reinforcement learning framework for image captioning that uses a policy and value network guided by a visual-semantic embedding reward to generate more accurate captions.

Contribution

It introduces a decision-making framework with policy and value networks trained via actor-critic reinforcement learning, utilizing a new embedding-based reward for improved captioning.

Findings

01

Outperforms state-of-the-art methods on Microsoft COCO dataset

02

Demonstrates improved evaluation metrics across multiple benchmarks

03

Validates the effectiveness of embedding-based reward in caption generation

Abstract

Image captioning is a challenging problem owing to the complexity in understanding the image content and diverse ways of describing it in natural language. Recent advances in deep neural networks have substantially improved the performance of this task. Most state-of-the-art approaches follow an encoder-decoder framework, which generates captions using a sequential recurrent prediction model. However, in this paper, we introduce a novel decision-making framework for image captioning. We utilize a "policy network" and a "value network" to collaboratively generate captions. The policy network serves as a local guidance by providing the confidence of predicting the next word according to the current state. Additionally, the value network serves as a global and lookahead guidance by evaluating all possible extensions of the current state. In essence, it adjusts the goal of predicting the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Deep Reinforcement Learning-Based Image Captioning With Embedding Reward· youtube

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning