Let there be a clock on the beach: Reducing Object Hallucination in   Image Captioning

Ali Furkan Biten; Lluis Gomez; Dimosthenis Karatzas

arXiv:2110.01705·cs.CV·November 3, 2021

Let there be a clock on the beach: Reducing Object Hallucination in Image Captioning

Ali Furkan Biten, Lluis Gomez, Dimosthenis Karatzas

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces three training augmentation techniques for image captioning models that effectively reduce object hallucination without requiring additional data or larger models, improving alignment with human expectations.

Contribution

The paper proposes novel augmentation methods that decrease object hallucination in captioning models without increasing data or model complexity.

Findings

01

Significant reduction in object hallucination metrics.

02

Decreased dependency on visual features.

03

Methods are simple and do not require additional data.

Abstract

Explaining an image with missing or non-existent objects is known as object bias (hallucination) in image captioning. This behaviour is quite common in the state-of-the-art captioning models which is not desirable by humans. To decrease the object hallucination in captioning, we propose three simple yet efficient training augmentation method for sentences which requires no new training data or increase in the model size. By extensive analysis, we show that the proposed methods can significantly diminish our models' object bias on hallucination metrics. Moreover, we experimentally demonstrate that our methods decrease the dependency on the visual features. All of our code, configuration files and model weights will be made public.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

furkanbiten/object-bias
pytorchOfficial

Videos

Let there be a clock on the beach: Reducing Object Hallucination in Image Captioning· youtube

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization