Tinted Frames: Question Framing Blinds Vision-Language Models
Wan-Cyuan Fan, Jiayun Luo, Declan Kutscher, Leonid Sigal, Ritwik Gupta

TL;DR
This paper reveals that vision-language models are selectively blind to visual inputs depending on linguistic framing, which affects their attention and accuracy, and proposes a prompt-tuning method to improve visual grounding.
Contribution
The study uncovers how framing influences attention in VLMs and introduces a prompt-tuning approach to enhance visual grounding and robustness across framings.
Findings
Framing alters attention distribution over images.
Constrained framings reduce focus on relevant image regions.
Prompt-tuning improves visual grounding and model robustness.
Abstract
Vision-Language Models (VLMs) have been shown to be blind, often underutilizing their visual inputs even on tasks that require visual reasoning. In this work, we demonstrate that VLMs are selectively blind. They modulate the amount of attention applied to visual inputs based on linguistic framing even when alternative framings demand identical visual reasoning. Using visual attention as a probe, we quantify how framing alters both the amount and distribution of attention over the image. Constrained framings, such as multiple choice and yes/no, induce substantially lower attention to image context compared to open-ended, reduce focus on task-relevant regions, and shift attention towards uninformative tokens. We further demonstrate that this attention misallocation is the principal cause of degraded accuracy and cross-framing inconsistency. Building on this mechanistic insight, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Neurobiology of Language and Bilingualism
