Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level   Vision

Haoning Wu; Zicheng Zhang; Erli Zhang; Chaofeng Chen; Liang Liao,; Annan Wang; Chunyi Li; Wenxiu Sun; Qiong Yan; Guangtao Zhai; Weisi Lin

arXiv:2309.14181·cs.CV·January 2, 2024·20 cites

Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision

Haoning Wu, Zicheng Zhang, Erli Zhang, Chaofeng Chen, Liang Liao,, Annan Wang, Chunyi Li, Wenxiu Sun, Qiong Yan, Guangtao Zhai, Weisi Lin

PDF

Open Access 1 Repo 2 Datasets

TL;DR

Q-Bench is a comprehensive benchmark designed to evaluate multi-modality large language models on low-level visual perception, description, and quality assessment, revealing their current capabilities and areas needing improvement.

Contribution

This work introduces Q-Bench, the first systematic benchmark for assessing low-level visual skills of MLLMs across perception, description, and quality evaluation.

Findings

01

MLLMs show preliminary low-level visual skills.

02

Current skills are unstable and imprecise.

03

Benchmark encourages targeted improvements in MLLMs.

Abstract

The rapid evolution of Multi-modality Large Language Models (MLLMs) has catalyzed a shift in computer vision from specialized models to general-purpose foundation models. Nevertheless, there is still an inadequacy in assessing the abilities of MLLMs on low-level visual perception and understanding. To address this gap, we present Q-Bench, a holistic benchmark crafted to systematically evaluate potential abilities of MLLMs on three realms: low-level visual perception, low-level visual description, and overall visual quality assessment. a) To evaluate the low-level perception ability, we construct the LLVisionQA dataset, consisting of 2,990 diverse-sourced images, each equipped with a human-asked question focusing on its low-level attributes. We then measure the correctness of MLLMs on answering these questions. b) To examine the description ability of MLLMs on low-level information, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Q-Future/Q-Bench
pytorchOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Subtitles and Audiovisual Media

MethodsALIGN