DistortBench: Benchmarking Vision Language Models on Image Distortion Identification
Divyanshu Goyal, Akhil Eppa, Vanya Bannihatti Kumar

TL;DR
DistortBench is a comprehensive benchmark designed to evaluate vision-language models' ability to recognize and interpret various image distortions, revealing significant gaps in their low-level perceptual understanding.
Contribution
This work introduces DistortBench, a large diagnostic benchmark for assessing VLMs' perception of image distortions across multiple types and severity levels, highlighting current limitations.
Findings
Best VLM achieves only 61.9% accuracy on distortion recognition.
Performance shows weak and non-monotonic scaling with model size.
Most models struggle with low-level perceptual understanding compared to humans.
Abstract
Vision-language models (VLMs) are increasingly used in settings where sensitivity to low-level image degradations matters, including content moderation, image restoration, and quality monitoring. Yet their ability to recognize distortion type and severity remains poorly understood. We present DistortBench, a diagnostic benchmark for no-reference distortion perception in VLMs. DistortBench contains 13,500 four-choice questions covering 27 distortion types, six perceptual categories, and five severity levels: 25 distortions inherit KADID-10k calibrations, while two added rotation distortions use monotonic angle-based levels. We evaluate 18 VLMs, including 17 open-weight models from five families and one proprietary model. Despite strong performance on high-level vision-language tasks, the best model reaches only 61.9% accuracy, just below the human majority-vote baseline of 65.7% (average…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
