ClueTracer: Question-to-Vision Clue Tracing for Training-Free Hallucination Suppression in Multimodal Reasoning

Gongli Xi; Kun Wang; Zeming Gao; Huahui Yi; Haolang Lu; Ye Tian; Wendong Wang

arXiv:2602.02004·cs.CV·February 3, 2026

ClueTracer: Question-to-Vision Clue Tracing for Training-Free Hallucination Suppression in Multimodal Reasoning

Gongli Xi, Kun Wang, Zeming Gao, Huahui Yi, Haolang Lu, Ye Tian, Wendong Wang

PDF

Open Access

TL;DR

ClueTracer is a training-free, architecture-agnostic method that traces reasoning pathways to suppress hallucinations in multimodal models, significantly improving reasoning accuracy without additional training.

Contribution

It introduces ClueTracer, a novel plugin that localizes task-relevant visual clues during reasoning, effectively reducing hallucinations in multimodal models without extra training.

Findings

01

ClueTracer improves reasoning benchmark performance by 1.21x.

02

It enhances non-reasoning models by 1.14x.

03

ClueTracer is training-free and architecture-agnostic.

Abstract

Large multimodal reasoning models solve challenging visual problems via explicit long-chain inference: they gather visual clues from images and decode clues into textual tokens. Yet this capability also increases hallucinations, where the model generates content that is not supported by the input image or the question. To understand this failure mode, we identify \emph{reasoning drift}: during clue gathering, the model over-focuses on question-irrelevant entities, diluting focus on task-relevant cues and gradually decoupling the reasoning trace from visual grounding. As a consequence, many inference-time localization or intervention methods developed for non-reasoning models fail to pinpoint the true clues in reasoning settings. Motivated by these insights, we introduce ClueRecall, a metric for assessing visual clue retrieval, and present ClueTracer, a training-free, parameter-free, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)