DistortBench: Benchmarking Vision Language Models on Image Distortion Identification

Divyanshu Goyal; Akhil Eppa; Vanya Bannihatti Kumar

arXiv:2604.19966·cs.CV·April 23, 2026

DistortBench: Benchmarking Vision Language Models on Image Distortion Identification

Divyanshu Goyal, Akhil Eppa, Vanya Bannihatti Kumar

PDF

TL;DR

DistortBench is a comprehensive benchmark designed to evaluate vision-language models' ability to recognize and interpret various image distortions, revealing significant gaps in their low-level perceptual understanding.

Contribution

This work introduces DistortBench, a large diagnostic benchmark for assessing VLMs' perception of image distortions across multiple types and severity levels, highlighting current limitations.

Findings

01

Best VLM achieves only 61.9% accuracy on distortion recognition.

02

Performance shows weak and non-monotonic scaling with model size.

03

Most models struggle with low-level perceptual understanding compared to humans.

Abstract

Vision-language models (VLMs) are increasingly used in settings where sensitivity to low-level image degradations matters, including content moderation, image restoration, and quality monitoring. Yet their ability to recognize distortion type and severity remains poorly understood. We present DistortBench, a diagnostic benchmark for no-reference distortion perception in VLMs. DistortBench contains 13,500 four-choice questions covering 27 distortion types, six perceptual categories, and five severity levels: 25 distortions inherit KADID-10k calibrations, while two added rotation distortions use monotonic angle-based levels. We evaluate 18 VLMs, including 17 open-weight models from five families and one proprietary model. Despite strong performance on high-level vision-language tasks, the best model reaches only 61.9% accuracy, just below the human majority-vote baseline of 65.7% (average…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.