Q-Bench-Portrait: Benchmarking Multimodal Large Language Models on Portrait Image Quality Perception

Sijing Wu; Yunhao Li; Zicheng Zhang; Qi Jia; Xinyue Li; Huiyu Duan; Xiongkuo Min; Guangtao Zhai

arXiv:2601.18346·cs.CV·January 27, 2026

Q-Bench-Portrait: Benchmarking Multimodal Large Language Models on Portrait Image Quality Perception

Sijing Wu, Yunhao Li, Zicheng Zhang, Qi Jia, Xinyue Li, Huiyu Duan, Xiongkuo Min, Guangtao Zhai

PDF

Open Access

TL;DR

This paper introduces Q-Bench-Portrait, a comprehensive benchmark for evaluating multimodal large language models' ability to perceive and assess portrait image quality across various distortions and aesthetic dimensions.

Contribution

It presents the first dedicated benchmark for portrait image quality perception, including diverse image types, quality dimensions, and question formats, enabling systematic evaluation of MLLMs.

Findings

01

Current MLLMs show limited and imprecise portrait perception.

02

Models perform significantly below human judgment levels.

03

Benchmark reveals substantial gaps in model capabilities.

Abstract

Recent advances in multimodal large language models (MLLMs) have demonstrated impressive performance on existing low-level vision benchmarks, which primarily focus on generic images. However, their capabilities to perceive and assess portrait images, a domain characterized by distinct structural and perceptual properties, remain largely underexplored. To this end, we introduce Q-Bench-Portrait, the first holistic benchmark specifically designed for portrait image quality perception, comprising 2,765 image-question-answer triplets and featuring (1) diverse portrait image sources, including natural, synthetic distortion, AI-generated, artistic, and computer graphics images; (2) comprehensive quality dimensions, covering technical distortions, AIGC-specific distortions, and aesthetics; and (3) a range of question formats, including single-choice, multiple-choice, true/false, and open-ended…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Visual Attention and Saliency Detection · Generative Adversarial Networks and Image Synthesis