Review Networks for Caption Generation
Zhilin Yang, Ye Yuan, Yuexin Wu, Ruslan Salakhutdinov, William W., Cohen

TL;DR
This paper introduces review networks, an extension to encoder-decoder models that enhances caption generation by performing review steps with attention, leading to improved performance on image and code captioning tasks.
Contribution
The paper presents a novel review network framework that generalizes and improves existing encoder-decoder models for caption generation.
Findings
Outperforms state-of-the-art on image captioning
Enhances source code captioning accuracy
Framework is compatible with various encoder-decoder architectures
Abstract
We propose a novel extension of the encoder-decoder framework, called a review network. The review network is generic and can enhance any existing encoder- decoder model: in this paper, we consider RNN decoders with both CNN and RNN encoders. The review network performs a number of review steps with attention mechanism on the encoder hidden states, and outputs a thought vector after each review step; the thought vectors are used as the input of the attention mechanism in the decoder. We show that conventional encoder-decoders are a special case of our framework. Empirically, we show that our framework improves over state-of- the-art encoder-decoder systems on the tasks of image captioning and source code captioning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
