Let there be a clock on the beach: Reducing Object Hallucination in Image Captioning
Ali Furkan Biten, Lluis Gomez, Dimosthenis Karatzas

TL;DR
This paper introduces three training augmentation techniques for image captioning models that effectively reduce object hallucination without requiring additional data or larger models, improving alignment with human expectations.
Contribution
The paper proposes novel augmentation methods that decrease object hallucination in captioning models without increasing data or model complexity.
Findings
Significant reduction in object hallucination metrics.
Decreased dependency on visual features.
Methods are simple and do not require additional data.
Abstract
Explaining an image with missing or non-existent objects is known as object bias (hallucination) in image captioning. This behaviour is quite common in the state-of-the-art captioning models which is not desirable by humans. To decrease the object hallucination in captioning, we propose three simple yet efficient training augmentation method for sentences which requires no new training data or increase in the model size. By extensive analysis, we show that the proposed methods can significantly diminish our models' object bias on hallucination metrics. Moreover, we experimentally demonstrate that our methods decrease the dependency on the visual features. All of our code, configuration files and model weights will be made public.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Let there be a clock on the beach: Reducing Object Hallucination in Image Captioning· youtube
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
