A Baseline for Detecting Out-of-Distribution Examples in Image Captioning
Gabi Shalev, Gal-Lev Shalev, Joseph Keshet

TL;DR
This paper addresses the challenge of detecting out-of-distribution images in image captioning by evaluating the effectiveness of caption likelihood scores for identifying images that differ from the training distribution.
Contribution
The paper formulates the OOD detection problem in image captioning, proposes an evaluation setup, and demonstrates the effectiveness of caption likelihood scores for OOD detection.
Findings
Likelihood scores effectively detect OOD images.
Relatedness between images and captions is captured in likelihood scores.
Proposed evaluation setup for OOD detection in captioning.
Abstract
Image captioning research achieved breakthroughs in recent years by developing neural models that can generate diverse and high-quality descriptions for images drawn from the same distribution as training images. However, when facing out-of-distribution (OOD) images, such as corrupted images, or images containing unknown objects, the models fail in generating relevant captions. In this paper, we consider the problem of OOD detection in image captioning. We formulate the problem and suggest an evaluation setup for assessing the model's performance on the task. Then, we analyze and show the effectiveness of the caption's likelihood score at detecting and rejecting OOD images, which implies that the relatedness between the input image and the generated caption is encapsulated within the score.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Advanced Image and Video Retrieval Techniques
