Generate Image Descriptions based on Deep RNN and Memory Cells for   Images Features

Shijian Tang; Song Han

arXiv:1602.01895·cs.CV·February 8, 2016·1 cites

Generate Image Descriptions based on Deep RNN and Memory Cells for Images Features

Shijian Tang, Song Han

PDF

Open Access

TL;DR

This paper introduces a novel deep RNN model with memory cells that selectively gate image feature information, improving the quality of generated image descriptions on benchmark datasets.

Contribution

The paper proposes adding memory cells to RNNs for image captioning, allowing better control over image feature integration during sentence generation.

Findings

01

Outperforms state-of-the-art models on Flickr8K and Flickr30K datasets

02

Achieves higher BLEU scores indicating improved caption quality

03

Demonstrates effective memorization of image feature importance

Abstract

Generating natural language descriptions for images is a challenging task. The traditional way is to use the convolutional neural network (CNN) to extract image features, followed by recurrent neural network (RNN) to generate sentences. In this paper, we present a new model that added memory cells to gate the feeding of image features to the deep neural network. The intuition is enabling our model to memorize how much information from images should be fed at each stage of the RNN. Experiments on Flickr8K and Flickr30K datasets showed that our model outperforms other state-of-the-art models with higher BLEU scores.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Topic Modeling