Head-Aware Visual Cropping: Enhancing Fine-Grained VQA with Attention-Guided Subimage
Junfei Xie, Peng Pan, Xulong Zhang

TL;DR
This paper introduces HAVC, a training-free method that enhances fine-grained visual reasoning in MLLMs by selectively refining attention heads to produce more accurate visual grounding and cropping guidance.
Contribution
HAVC is a novel, training-free approach that filters and refines attention heads for improved visual grounding and cropping in VQA tasks, outperforming existing strategies.
Findings
HAVC improves localization accuracy in fine-grained VQA benchmarks.
It outperforms state-of-the-art cropping strategies.
HAVC enhances visual grounding and task relevance in MLLMs.
Abstract
Multimodal Large Language Models (MLLMs) show strong performance in Visual Question Answering (VQA) but remain limited in fine-grained reasoning due to low-resolution inputs and noisy attention aggregation. We propose \textbf{Head Aware Visual Cropping (HAVC)}, a training-free method that improves visual grounding by leveraging a selectively refined subset of attention heads. HAVC first filters heads through an OCR-based diagnostic task, ensuring that only those with genuine grounding ability are retained. At inference, these heads are further refined using spatial entropy for stronger spatial concentration and gradient sensitivity for predictive contribution. The fused signals produce a reliable Visual Cropping Guidance Map, which highlights the most task-relevant region and guides the cropping of a subimage subsequently provided to the MLLM together with the image-question pair.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
