Vision-Language Models vs Human: Perceptual Image Quality Assessment

Imran Mehmood; Imad Ali Shah; Ming Ronnier Luo; and Brian Deegan

arXiv:2603.24578·cs.CV·March 26, 2026

Vision-Language Models vs Human: Perceptual Image Quality Assessment

Imran Mehmood, Imad Ali Shah, Ming Ronnier Luo, and Brian Deegan

PDF

Open Access

TL;DR

This study benchmarks Vision Language Models against human psychophysical data for perceptual image quality assessment, revealing attribute-dependent variability and insights into model alignment with human perception.

Contribution

It systematically evaluates VLMs for IQA, highlighting their strengths and limitations in approximating human perceptual judgments across different image attributes.

Findings

01

High correlation of VLMs with human judgments on colorfulness (up to 0.93)

02

VLMs underperform on contrast assessment compared to colorfulness

03

Model consistency does not always equate to better human alignment

Abstract

Psychophysical experiments remain the most reliable approach for perceptual image quality assessment (IQA), yet their cost and limited scalability encourage automated approaches. We investigate whether Vision Language Models (VLMs) can approximate human perceptual judgments across three image quality scales: contrast, colorfulness and overall preference. Six VLMs four proprietary and two openweight models are benchmarked against psychophysical data. This work presents a systematic benchmark of VLMs for perceptual IQA through comparison with human psychophysical data. The results reveal strong attribute dependent variability models with high human alignment for colorfulness (\rho up to 0.93) underperform on contrast and vice-versa. Attribute weighting analysis further shows that most VLMs assign higher weights to colorfulness compared to contrast when evaluating overall preference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Video Quality Assessment · Visual Attention and Saliency Detection · Aesthetic Perception and Analysis