Review Networks for Caption Generation

Zhilin Yang; Ye Yuan; Yuexin Wu; Ruslan Salakhutdinov; William W.; Cohen

arXiv:1605.07912·cs.LG·October 28, 2016·84 cites

Review Networks for Caption Generation

Zhilin Yang, Ye Yuan, Yuexin Wu, Ruslan Salakhutdinov, William W., Cohen

PDF

Open Access

TL;DR

This paper introduces review networks, an extension to encoder-decoder models that enhances caption generation by performing review steps with attention, leading to improved performance on image and code captioning tasks.

Contribution

The paper presents a novel review network framework that generalizes and improves existing encoder-decoder models for caption generation.

Findings

01

Outperforms state-of-the-art on image captioning

02

Enhances source code captioning accuracy

03

Framework is compatible with various encoder-decoder architectures

Abstract

We propose a novel extension of the encoder-decoder framework, called a review network. The review network is generic and can enhance any existing encoder- decoder model: in this paper, we consider RNN decoders with both CNN and RNN encoders. The review network performs a number of review steps with attention mechanism on the encoder hidden states, and outputs a thought vector after each review step; the thought vectors are used as the input of the attention mechanism in the decoder. We show that conventional encoder-decoders are a special case of our framework. Empirically, we show that our framework improves over state-of- the-art encoder-decoder systems on the tasks of image captioning and source code captioning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization