Q-Bench-Portrait: Benchmarking Multimodal Large Language Models on Portrait Image Quality Perception
Sijing Wu, Yunhao Li, Zicheng Zhang, Qi Jia, Xinyue Li, Huiyu Duan, Xiongkuo Min, Guangtao Zhai

TL;DR
This paper introduces Q-Bench-Portrait, a comprehensive benchmark for evaluating multimodal large language models' ability to perceive and assess portrait image quality across various distortions and aesthetic dimensions.
Contribution
It presents the first dedicated benchmark for portrait image quality perception, including diverse image types, quality dimensions, and question formats, enabling systematic evaluation of MLLMs.
Findings
Current MLLMs show limited and imprecise portrait perception.
Models perform significantly below human judgment levels.
Benchmark reveals substantial gaps in model capabilities.
Abstract
Recent advances in multimodal large language models (MLLMs) have demonstrated impressive performance on existing low-level vision benchmarks, which primarily focus on generic images. However, their capabilities to perceive and assess portrait images, a domain characterized by distinct structural and perceptual properties, remain largely underexplored. To this end, we introduce Q-Bench-Portrait, the first holistic benchmark specifically designed for portrait image quality perception, comprising 2,765 image-question-answer triplets and featuring (1) diverse portrait image sources, including natural, synthetic distortion, AI-generated, artistic, and computer graphics images; (2) comprehensive quality dimensions, covering technical distortions, AIGC-specific distortions, and aesthetics; and (3) a range of question formats, including single-choice, multiple-choice, true/false, and open-ended…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Visual Attention and Saliency Detection · Generative Adversarial Networks and Image Synthesis
