Investigate the Low-level Visual Perception in Vision-Language based Image Quality Assessment
Yuan Li, Zitang Sun, Yen-Ju Chen, Shin'ya Nishida

TL;DR
This paper investigates the ability of vision-language models to detect low-level visual distortions, revealing that targeted fine-tuning of vision encoders significantly improves their perception and interpretability of such distortions.
Contribution
It introduces a low-level distortion perception task and demonstrates that vision encoder alignment enhances distortion recognition in MLLMs.
Findings
MLLMs tend to overfit training templates for distortions
Improving vision encoder alignment increases distortion recognition accuracy from 14.92% to 84.43%
Dedicated constraints on the vision encoder strengthen visual representations
Abstract
Recent advances in Image Quality Assessment (IQA) have leveraged Multi-modal Large Language Models (MLLMs) to generate descriptive explanations. However, despite their strong visual perception modules, these models often fail to reliably detect basic low-level distortions such as blur, noise, and compression, and may produce inconsistent evaluations across repeated inferences. This raises an essential question: do MLLM-based IQA systems truly perceive the visual features that matter? To examine this issue, we introduce a low-level distortion perception task that requires models to classify specific distortion types. Our component-wise analysis shows that although MLLMs are structurally capable of representing such distortions, they tend to overfit training templates, leading to biases in quality scoring. As a result, critical low-level features are weakened or lost during the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Generative Adversarial Networks and Image Synthesis
