Investigate the Low-level Visual Perception in Vision-Language based Image Quality Assessment

Yuan Li; Zitang Sun; Yen-Ju Chen; Shin'ya Nishida

arXiv:2512.09573·cs.CV·December 11, 2025

Investigate the Low-level Visual Perception in Vision-Language based Image Quality Assessment

Yuan Li, Zitang Sun, Yen-Ju Chen, Shin'ya Nishida

PDF

Open Access

TL;DR

This paper investigates the ability of vision-language models to detect low-level visual distortions, revealing that targeted fine-tuning of vision encoders significantly improves their perception and interpretability of such distortions.

Contribution

It introduces a low-level distortion perception task and demonstrates that vision encoder alignment enhances distortion recognition in MLLMs.

Findings

01

MLLMs tend to overfit training templates for distortions

02

Improving vision encoder alignment increases distortion recognition accuracy from 14.92% to 84.43%

03

Dedicated constraints on the vision encoder strengthen visual representations

Abstract

Recent advances in Image Quality Assessment (IQA) have leveraged Multi-modal Large Language Models (MLLMs) to generate descriptive explanations. However, despite their strong visual perception modules, these models often fail to reliably detect basic low-level distortions such as blur, noise, and compression, and may produce inconsistent evaluations across repeated inferences. This raises an essential question: do MLLM-based IQA systems truly perceive the visual features that matter? To examine this issue, we introduce a low-level distortion perception task that requires models to classify specific distortion types. Our component-wise analysis shows that although MLLMs are structurally capable of representing such distortions, they tend to overfit training templates, leading to biases in quality scoring. As a result, critical low-level features are weakened or lost during the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Generative Adversarial Networks and Image Synthesis