Saliency-Aware Multi-Route Thinking: Revisiting Vision-Language Reasoning
Mingjia Shi, Yinhan He, Yaochen Zhu, Jundong Li

TL;DR
This paper introduces Saliency-Aware Principle (SAP), a novel, model-agnostic method for improving vision-language reasoning by enabling stable, multi-route inference that reduces hallucinations and enhances reasoning stability without extra training.
Contribution
The paper proposes SAP, a high-level, saliency-aware reasoning principle that supports multi-route inference, improving stability and accuracy in vision-language models without additional training.
Findings
SAP reduces object hallucination in VLMs.
SAP achieves more stable reasoning with lower latency.
SAP performs competitively with existing methods under similar token budgets.
Abstract
Vision-language models (VLMs) aim to reason by jointly leveraging visual and textual modalities. While allocating additional inference-time computation has proven effective for large language models (LLMs), achieving similar scaling in VLMs remains challenging. A key obstacle is that visual inputs are typically provided only once at the start of generation, while textual reasoning (e.g., early visual summaries) is generated autoregressively, causing reasoning to become increasingly text-dominated and allowing early visual grounding errors to accumulate. Moreover, vanilla guidance for visual grounding during inference is often coarse and noisy, making it difficult to steer reasoning over long texts. To address these challenges, we propose \emph{Saliency-Aware Principle} (SAP) selection. SAP operates on high-level reasoning principles rather than token-level trajectories, which enable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI
