Revisiting MLLM Based Image Quality Assessment: Errors and Remedy

Zhenchen Tang; Songlin Yang; Bo Peng; Zichuan Wang; Jing Dong

arXiv:2511.07812·cs.CV·November 12, 2025

Revisiting MLLM Based Image Quality Assessment: Errors and Remedy

Zhenchen Tang, Songlin Yang, Bo Peng, Zichuan Wang, Jing Dong

PDF

Open Access 1 Models 1 Datasets 1 Video

TL;DR

This paper identifies key errors in MLLM-based image quality assessment and introduces Q-Scorer, a new framework that significantly improves performance by addressing token-to-score conversion issues.

Contribution

The paper provides a theoretical analysis of errors in previous MLLM-IQA methods and proposes Q-Scorer, a simple framework with a regression module and IQA-specific tokens, achieving state-of-the-art results.

Findings

01

Q-Scorer outperforms existing methods on multiple benchmarks.

02

It generalizes well to mixed datasets.

03

Combining Q-Scorer with other methods yields further improvements.

Abstract

The rapid progress of multi-modal large language models (MLLMs) has boosted the task of image quality assessment (IQA). However, a key challenge arises from the inherent mismatch between the discrete token outputs of MLLMs and the continuous nature of quality scores required by IQA tasks. This discrepancy significantly hinders the performance of MLLM-based IQA methods. Previous approaches that convert discrete token predictions into continuous scores often suffer from conversion errors. Moreover, the semantic confusion introduced by level tokens (e.g., ``good'') further constrains the performance of MLLMs on IQA tasks and degrades their original capabilities for related tasks. To tackle these problems, we provide a theoretical analysis of the errors inherent in previous approaches and, motivated by this analysis, propose a simple yet effective framework, Q-Scorer. This framework…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
2kxx/Qscorer_lora_5_1
model· 4 dl
4 dl

Datasets

2kxx/Q-Scorer
dataset· 7 dl
7 dl

Videos

Revisiting MLLM Based Image Quality Assessment: Errors and Remedy· underline

Taxonomy

TopicsImage and Video Quality Assessment · Visual Attention and Saliency Detection · Advanced Image Processing Techniques