Incorporating Copying Mechanism in Image Captioning for Learning Novel   Objects

Ting Yao; Yingwei Pan; Yehao Li; Tao Mei

arXiv:1708.05271·cs.CV·August 18, 2017·33 cites

Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects

Ting Yao, Yingwei Pan, Yehao Li, Tao Mei

PDF

Open Access

TL;DR

This paper introduces LSTM-C, a novel image captioning architecture that incorporates a copying mechanism, enabling the model to describe novel objects outside the training data by leveraging object recognition datasets.

Contribution

The paper presents a new LSTM-C architecture that integrates copying mechanisms into image captioning models to improve description of unseen objects.

Findings

01

LSTM-C effectively describes novel objects in captions.

02

LSTM-C outperforms state-of-the-art models on MSCOCO and ImageNet.

03

The copying mechanism enhances caption diversity and accuracy.

Abstract

Image captioning often requires a large set of training image-sentence pairs. In practice, however, acquiring sufficient training pairs is always expensive, making the recent captioning models limited in their ability to describe objects outside of training corpora (i.e., novel objects). In this paper, we present Long Short-Term Memory with Copying Mechanism (LSTM-C) --- a new architecture that incorporates copying into the Convolutional Neural Networks (CNN) plus Recurrent Neural Networks (RNN) image captioning framework, for describing novel objects in captions. Specifically, freely available object recognition datasets are leveraged to develop classifiers for novel objects. Our LSTM-C then nicely integrates the standard word-by-word sentence generation by a decoder RNN with copying mechanism which may instead select words from novel objects at proper places in the output sentence.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning