Exploring Nearest Neighbor Approaches for Image Captioning

Jacob Devlin; Saurabh Gupta; Ross Girshick; Margaret Mitchell; C.; Lawrence Zitnick

arXiv:1505.04467·cs.CV·May 19, 2015·160 cites

Exploring Nearest Neighbor Approaches for Image Captioning

Jacob Devlin, Saurabh Gupta, Ross Girshick, Margaret Mitchell, C., Lawrence Zitnick

PDF

Open Access 1 Repo

TL;DR

This paper investigates nearest neighbor methods for image captioning, finding they perform comparably to recent models on automatic metrics but are less preferred by humans, highlighting the gap between automated and human evaluation.

Contribution

The study systematically evaluates nearest neighbor baselines for image captioning and compares their performance to state-of-the-art generative models.

Findings

01

Nearest neighbor approaches perform similarly to recent models on automatic metrics.

02

Human evaluations favor models that generate novel captions over nearest neighbor methods.

03

Nearest neighbor methods are competitive baselines for image captioning tasks.

Abstract

We explore a variety of nearest neighbor baseline approaches for image captioning. These approaches find a set of nearest neighbor images in the training set from which a caption may be borrowed for the query image. We select a caption for the query image by finding the caption that best represents the "consensus" of the set of candidate captions gathered from the nearest neighbor images. When measured by automatic evaluation metrics on the MS COCO caption evaluation server, these approaches perform as well as many recent approaches that generate novel captions. However, human studies show that a method that generates novel captions is still preferred over the nearest neighbor approach.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mjhucla/mRNN-CR
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning