Revisiting MLLM Based Image Quality Assessment: Errors and Remedy
Zhenchen Tang, Songlin Yang, Bo Peng, Zichuan Wang, Jing Dong

TL;DR
This paper identifies key errors in MLLM-based image quality assessment and introduces Q-Scorer, a new framework that significantly improves performance by addressing token-to-score conversion issues.
Contribution
The paper provides a theoretical analysis of errors in previous MLLM-IQA methods and proposes Q-Scorer, a simple framework with a regression module and IQA-specific tokens, achieving state-of-the-art results.
Findings
Q-Scorer outperforms existing methods on multiple benchmarks.
It generalizes well to mixed datasets.
Combining Q-Scorer with other methods yields further improvements.
Abstract
The rapid progress of multi-modal large language models (MLLMs) has boosted the task of image quality assessment (IQA). However, a key challenge arises from the inherent mismatch between the discrete token outputs of MLLMs and the continuous nature of quality scores required by IQA tasks. This discrepancy significantly hinders the performance of MLLM-based IQA methods. Previous approaches that convert discrete token predictions into continuous scores often suffer from conversion errors. Moreover, the semantic confusion introduced by level tokens (e.g., ``good'') further constrains the performance of MLLMs on IQA tasks and degrades their original capabilities for related tasks. To tackle these problems, we provide a theoretical analysis of the errors inherent in previous approaches and, motivated by this analysis, propose a simple yet effective framework, Q-Scorer. This framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsImage and Video Quality Assessment · Visual Attention and Saliency Detection · Advanced Image Processing Techniques
