Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision
Haoning Wu, Zicheng Zhang, Erli Zhang, Chaofeng Chen, Liang Liao,, Annan Wang, Chunyi Li, Wenxiu Sun, Qiong Yan, Guangtao Zhai, Weisi Lin

TL;DR
Q-Bench is a comprehensive benchmark designed to evaluate multi-modality large language models on low-level visual perception, description, and quality assessment, revealing their current capabilities and areas needing improvement.
Contribution
This work introduces Q-Bench, the first systematic benchmark for assessing low-level visual skills of MLLMs across perception, description, and quality evaluation.
Findings
MLLMs show preliminary low-level visual skills.
Current skills are unstable and imprecise.
Benchmark encourages targeted improvements in MLLMs.
Abstract
The rapid evolution of Multi-modality Large Language Models (MLLMs) has catalyzed a shift in computer vision from specialized models to general-purpose foundation models. Nevertheless, there is still an inadequacy in assessing the abilities of MLLMs on low-level visual perception and understanding. To address this gap, we present Q-Bench, a holistic benchmark crafted to systematically evaluate potential abilities of MLLMs on three realms: low-level visual perception, low-level visual description, and overall visual quality assessment. a) To evaluate the low-level perception ability, we construct the LLVisionQA dataset, consisting of 2,990 diverse-sourced images, each equipped with a human-asked question focusing on its low-level attributes. We then measure the correctness of MLLMs on answering these questions. b) To examine the description ability of MLLMs on low-level information, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Subtitles and Audiovisual Media
MethodsALIGN
