Less Detail, Better Answers: Degradation-Driven Prompting for VQA

Haoxuan Han; Weijie Wang; Zeyu Zhang; Yefei He; Bohan Zhuang

arXiv:2604.04838·cs.CV·April 8, 2026

Less Detail, Better Answers: Degradation-Driven Prompting for VQA

Haoxuan Han, Weijie Wang, Zeyu Zhang, Yefei He, Bohan Zhuang

PDF

1 Repo

TL;DR

Degradation-Driven Prompting (DDP) enhances VQA by intentionally reducing image details to focus models on essential structures, improving reasoning accuracy on challenging benchmarks.

Contribution

Introducing DDP, a novel framework that strategically degrades images and uses structural prompts to improve VQA performance and reasoning accuracy.

Findings

01

DDP improves VQA accuracy on challenging benchmarks.

02

Degrading images helps models focus on structural information.

03

Structural prompts combined with degradation outperform baseline methods.

Abstract

Recent advancements in Vision-Language Models (VLMs) have significantly pushed the boundaries of Visual Question Answering (VQA).However,high-resolution details can sometimes become noise that leads to hallucinations or reasoning errors. In this paper,we propose Degradation-Driven Prompting (DDP), a novel framework that improves VQA performance by strategically reducing image fidelity to force models to focus on essential structural information. We evaluate DDP across two distinct tasks. Physical attributes targets images prone to human misjudgment, where DDP employs a combination of 80p downsampling, structural visual aids (white background masks and orthometric lines), and In-Context Learning (ICL) to calibrate the model's focus. Perceptual phenomena addresses various machine-susceptible visual anomalies and illusions, including Visual Anomaly (VA), Color (CI), Motion(MI),Gestalt…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ziplab/DDP
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.