Visual Attention Reasoning via Hierarchical Search and Self-Verification

Wei Cai; Jian Zhao; Yuchen Yuan; Tianle Zhang; Ming Zhu; Haichuan Tang; Xuelong Li

arXiv:2510.18619·cs.AI·January 27, 2026

Visual Attention Reasoning via Hierarchical Search and Self-Verification

Wei Cai, Jian Zhao, Yuchen Yuan, Tianle Zhang, Ming Zhu, Haichuan Tang, Xuelong Li

PDF

Open Access

TL;DR

This paper introduces Visual Attention Reasoning (VAR), a reinforcement learning framework that improves multimodal large language models by enabling hierarchical search and self-verification to reduce hallucinations and enhance visual grounding.

Contribution

It presents a novel hierarchical search and self-verification framework with explicit visual grounding, backed by theoretical validation and superior experimental performance.

Findings

01

Significantly reduces hallucinations in MLLMs

02

Enforces traceable evidence grounding with bounding boxes

03

Outperforms state-of-the-art methods on safety benchmarks

Abstract

Multimodal Large Language Models (MLLMs) frequently hallucinate due to their reliance on fragile, linear reasoning and weak visual grounding. We propose Visual Attention Reasoning (VAR), a reinforcement learning framework that reformulates reasoning as a hierarchical search with self-verification. VAR enforces traceable evidence grounding by generating explicit bounding boxes, guided by a novel reward function combining geometric precision and semantic sufficiency. Furthermore, it replaces linear Chain-of-Thought with a tree-search policy capable of backtracking to correct logical errors. Theoretical analysis validates the framework's reliability, and extensive experiments demonstrate that VAR significantly outperforms state-of-the-art methods on complex hallucination and safety benchmarks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Adversarial Robustness in Machine Learning · Advanced Graph Neural Networks