MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification
Laura Fieback, Jakob Spiegelberg, Hanno Gottschalk

TL;DR
MetaToken is a lightweight, token-level hallucination detector for large vision-language models that operates efficiently without ground truth data, improving trustworthiness in image captioning tasks.
Contribution
Introduces MetaToken, a novel, low-cost token-level hallucination detection method applicable to any open-source LVLM without ground truth data.
Findings
MetaToken effectively detects hallucinations across four state-of-the-art LVLMs.
It operates at negligible computational cost.
It provides calibrated hallucination detection without requiring ground truth.
Abstract
Large Vision Language Models (LVLMs) have shown remarkable capabilities in multimodal tasks like visual question answering or image captioning. However, inconsistencies between the visual information and the generated text, a phenomenon referred to as hallucinations, remain an unsolved problem with regard to the trustworthiness of LVLMs. To address this problem, recent works proposed to incorporate computationally costly Large (Vision) Language Models in order to detect hallucinations on a sentence- or subsentence-level. In this work, we introduce MetaToken, a lightweight binary classifier to detect hallucinations on the token-level at negligible cost. Based on a statistical analysis, we reveal key factors of hallucinations in LVLMs. MetaToken can be applied to any open-source LVLM without any knowledge about ground truth data providing a calibrated detection of hallucinations. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · COVID-19 diagnosis using AI · Misinformation and Its Impacts
