A Comprehensive Study of Multimodal Large Language Models for Image   Quality Assessment

Tianhe Wu; Kede Ma; Jie Liang; Yujiu Yang; Lei Zhang

arXiv:2403.10854·cs.CV·July 12, 2024·2 cites

A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment

Tianhe Wu, Kede Ma, Jie Liang, Yujiu Yang, Lei Zhang

PDF

Open Access 2 Repos

TL;DR

This paper systematically evaluates multimodal large language models for image quality assessment, revealing GPT-4V's limited ability to discriminate fine details and compare multiple images, despite its overall reasonable performance.

Contribution

It provides a comprehensive analysis of prompting strategies for MLLMs in IQA and introduces a challenging sample selection method to evaluate their capabilities.

Findings

01

GPT-4V aligns with human perception but struggles with fine-grained differences.

02

MLLMs are less effective in multi-image comparison tasks.

03

Prompting strategies significantly influence MLLMs' IQA performance.

Abstract

While Multimodal Large Language Models (MLLMs) have experienced significant advancement in visual understanding and reasoning, their potential to serve as powerful, flexible, interpretable, and text-driven models for Image Quality Assessment (IQA) remains largely unexplored. In this paper, we conduct a comprehensive and systematic study of prompting MLLMs for IQA. We first investigate nine prompting systems for MLLMs as the combinations of three standardized testing procedures in psychophysics (i.e., the single-stimulus, double-stimulus, and multiple-stimulus methods) and three popular prompting strategies in natural language processing (i.e., the standard, in-context, and chain-of-thought prompting). We then present a difficult sample selection procedure, taking into account sample diversity and uncertainty, to further challenge MLLMs equipped with the respective optimal prompting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Fusion Techniques