Learning a Recurrent Visual Representation for Image Caption Generation
Xinlei Chen, C. Lawrence Zitnick

TL;DR
This paper introduces a recurrent neural network model with a visual memory for bidirectional image and sentence mapping, enabling novel caption generation and visual feature reconstruction, achieving state-of-the-art results.
Contribution
It presents a novel recurrent visual memory model that improves image captioning and retrieval by learning long-term visual concepts, outperforming previous embedding-based methods.
Findings
State-of-the-art image caption generation results.
Automatically generated captions preferred over 19.8% of human captions.
Competitive performance on image and sentence retrieval tasks.
Abstract
In this paper we explore the bi-directional mapping between images and their sentence-based descriptions. We propose learning this mapping using a recurrent neural network. Unlike previous approaches that map both sentences and images to a common embedding, we enable the generation of novel sentences given an image. Using the same model, we can also reconstruct the visual features associated with an image given its visual description. We use a novel recurrent visual memory that automatically learns to remember long-term visual concepts to aid in both sentence generation and visual feature reconstruction. We evaluate our approach on several tasks. These include sentence generation, sentence retrieval and image retrieval. State-of-the-art results are shown for the task of generating novel image descriptions. When compared to human generated captions, our automatically generated captions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
