MetaToken: Detecting Hallucination in Image Descriptions by Meta   Classification

Laura Fieback; Jakob Spiegelberg; Hanno Gottschalk

arXiv:2405.19186·cs.CV·March 26, 2025·1 cites

MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification

Laura Fieback, Jakob Spiegelberg, Hanno Gottschalk

PDF

Open Access

TL;DR

MetaToken is a lightweight, token-level hallucination detector for large vision-language models that operates efficiently without ground truth data, improving trustworthiness in image captioning tasks.

Contribution

Introduces MetaToken, a novel, low-cost token-level hallucination detection method applicable to any open-source LVLM without ground truth data.

Findings

01

MetaToken effectively detects hallucinations across four state-of-the-art LVLMs.

02

It operates at negligible computational cost.

03

It provides calibrated hallucination detection without requiring ground truth.

Abstract

Large Vision Language Models (LVLMs) have shown remarkable capabilities in multimodal tasks like visual question answering or image captioning. However, inconsistencies between the visual information and the generated text, a phenomenon referred to as hallucinations, remain an unsolved problem with regard to the trustworthiness of LVLMs. To address this problem, recent works proposed to incorporate computationally costly Large (Vision) Language Models in order to detect hallucinations on a sentence- or subsentence-level. In this work, we introduce MetaToken, a lightweight binary classifier to detect hallucinations on the token-level at negligible cost. Based on a statistical analysis, we reveal key factors of hallucinations in LVLMs. MetaToken can be applied to any open-source LVLM without any knowledge about ground truth data providing a calibrated detection of hallucinations. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · COVID-19 diagnosis using AI · Misinformation and Its Impacts