Saliency-Aware Multi-Route Thinking: Revisiting Vision-Language Reasoning

Mingjia Shi; Yinhan He; Yaochen Zhu; Jundong Li

arXiv:2602.16702·cs.CV·February 19, 2026

Saliency-Aware Multi-Route Thinking: Revisiting Vision-Language Reasoning

Mingjia Shi, Yinhan He, Yaochen Zhu, Jundong Li

PDF

Open Access

TL;DR

This paper introduces Saliency-Aware Principle (SAP), a novel, model-agnostic method for improving vision-language reasoning by enabling stable, multi-route inference that reduces hallucinations and enhances reasoning stability without extra training.

Contribution

The paper proposes SAP, a high-level, saliency-aware reasoning principle that supports multi-route inference, improving stability and accuracy in vision-language models without additional training.

Findings

01

SAP reduces object hallucination in VLMs.

02

SAP achieves more stable reasoning with lower latency.

03

SAP performs competitively with existing methods under similar token budgets.

Abstract

Vision-language models (VLMs) aim to reason by jointly leveraging visual and textual modalities. While allocating additional inference-time computation has proven effective for large language models (LLMs), achieving similar scaling in VLMs remains challenging. A key obstacle is that visual inputs are typically provided only once at the start of generation, while textual reasoning (e.g., early visual summaries) is generated autoregressively, causing reasoning to become increasingly text-dominated and allowing early visual grounding errors to accumulate. Moreover, vanilla guidance for visual grounding during inference is often coarse and noisy, making it difficult to steer reasoning over long texts. To address these challenges, we propose \emph{Saliency-Aware Principle} (SAP) selection. SAP operates on high-level reasoning principles rather than token-level trajectories, which enable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI