Taming Object Hallucinations with Verified Atomic Confidence Estimation

Jiarui Liu; Weihao Xuan; Zhijing Jin; Mona Diab

arXiv:2511.09228·cs.CV·November 13, 2025

Taming Object Hallucinations with Verified Atomic Confidence Estimation

Jiarui Liu, Weihao Xuan, Zhijing Jin, Mona Diab

PDF

Open Access 1 Video

TL;DR

TACO is a framework that reduces hallucinations in multimodal large language models by decomposing responses into atomic queries, estimating confidence through self-verification, and refining answers, thereby improving reliability and calibration.

Contribution

TACO introduces a novel self-verification and confidence calibration framework that enhances the faithfulness of MLLMs without external vision experts.

Findings

01

Outperforms direct prompting and Visual Contrastive Decoding

02

Reduces systematic biases in MLLMs

03

Improves confidence calibration across benchmarks

Abstract

Multimodal Large Language Models (MLLMs) often suffer from hallucinations, particularly errors in object existence, attributes, or relations, which undermine their reliability. We introduce TACO (Verified Atomic Confidence Estimation), a simple framework that mitigates hallucinations through self-verification and confidence calibration without relying on external vision experts. TACO decomposes responses into atomic queries, paraphrases them to reduce sensitivity to wording, and estimates confidence using self-consistency (black-box) or self-confidence (gray-box) aggregation, before refining answers with a language model. Experiments on five benchmarks (POPE, MME, HallusionBench, AMBER, and MM-Hal Bench) with two MLLMs (\texttt{LLaVA-1.5-7B} and \texttt{CogVLM2}) show that TACO consistently outperforms direct prompting and Visual Contrastive Decoding, reduces systematic biases, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Taming Object Hallucinations with Verified Atomic Confidence Estimation· underline

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Data Quality and Management · Topic Modeling