A Baseline for Detecting Out-of-Distribution Examples in Image   Captioning

Gabi Shalev; Gal-Lev Shalev; Joseph Keshet

arXiv:2207.05418·cs.CV·July 13, 2022

A Baseline for Detecting Out-of-Distribution Examples in Image Captioning

Gabi Shalev, Gal-Lev Shalev, Joseph Keshet

PDF

Open Access

TL;DR

This paper addresses the challenge of detecting out-of-distribution images in image captioning by evaluating the effectiveness of caption likelihood scores for identifying images that differ from the training distribution.

Contribution

The paper formulates the OOD detection problem in image captioning, proposes an evaluation setup, and demonstrates the effectiveness of caption likelihood scores for OOD detection.

Findings

01

Likelihood scores effectively detect OOD images.

02

Relatedness between images and captions is captured in likelihood scores.

03

Proposed evaluation setup for OOD detection in captioning.

Abstract

Image captioning research achieved breakthroughs in recent years by developing neural models that can generate diverse and high-quality descriptions for images drawn from the same distribution as training images. However, when facing out-of-distribution (OOD) images, such as corrupted images, or images containing unknown objects, the models fail in generating relevant captions. In this paper, we consider the problem of OOD detection in image captioning. We formulate the problem and suggest an evaluation setup for assessing the model's performance on the task. Then, we analyze and show the effectiveness of the caption's likelihood score at detecting and rejecting OOD images, which implies that the relatedness between the input image and the generated caption is encapsulated within the score.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Advanced Image and Video Retrieval Techniques