Guiding Long-Short Term Memory for Image Caption Generation

Xu Jia; Efstratios Gavves; Basura Fernando; Tinne Tuytelaars

arXiv:1509.04942·cs.CV·September 17, 2015·73 cites

Guiding Long-Short Term Memory for Image Caption Generation

Xu Jia, Efstratios Gavves, Basura Fernando, Tinne Tuytelaars

PDF

Open Access 1 Repo

TL;DR

This paper introduces gLSTM, an enhanced LSTM model for image captioning that incorporates semantic image information and improved beam search strategies, achieving competitive results on standard datasets.

Contribution

The paper presents a novel gLSTM model that integrates semantic image features into each LSTM unit for better caption generation.

Findings

01

gLSTM outperforms standard LSTM on benchmark datasets

02

Semantic guidance improves caption relevance

03

Length normalization enhances beam search results

Abstract

In this work we focus on the problem of image caption generation. We propose an extension of the long short term memory (LSTM) model, which we coin gLSTM for short. In particular, we add semantic information extracted from the image as extra input to each unit of the LSTM block, with the aim of guiding the model towards solutions that are more tightly coupled to the image content. Additionally, we explore different length normalization strategies for beam search in order to prevent from favoring short sentences. On various benchmark datasets such as Flickr8K, Flickr30K and MS COCO, we obtain results that are on par with or even outperform the current state-of-the-art.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liuqihan/Image-Caption
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory