Q-Ground: Image Quality Grounding with Large Multi-modality Models

Chaofeng Chen; Sensen Yang; Haoning Wu; Liang Liao; Zicheng Zhang,; Annan Wang; Wenxiu Sun; Qiong Yan; Weisi Lin

arXiv:2407.17035·cs.CV·July 25, 2024

Q-Ground: Image Quality Grounding with Large Multi-modality Models

Chaofeng Chen, Sensen Yang, Haoning Wu, Liang Liao, Zicheng Zhang,, Annan Wang, Wenxiu Sun, Qiong Yan, Weisi Lin

PDF

1 Repo 1 Datasets

TL;DR

Q-Ground introduces a novel framework combining large multi-modality models with a new dataset to enable detailed, region-aware image quality assessment and explanation through text prompts.

Contribution

The paper presents the first approach for fine-scale visual quality grounding using large multi-modality models and introduces the QGround-100K dataset for this purpose.

Findings

01

Effective multi-scale feature learning for quality assessment

02

Dual capability for image quality answering and distortion segmentation

03

Improved robustness with automatically labeled data

Abstract

Recent advances of large multi-modality models (LMM) have greatly improved the ability of image quality assessment (IQA) method to evaluate and explain the quality of visual content. However, these advancements are mostly focused on overall quality assessment, and the detailed examination of local quality, which is crucial for comprehensive visual understanding, is still largely unexplored. In this work, we introduce Q-Ground, the first framework aimed at tackling fine-scale visual quality grounding by combining large multi-modality models with detailed visual quality analysis. Central to our contribution is the introduction of the QGround-100K dataset, a novel resource containing 100k triplets of (image, quality text, distortion segmentation) to facilitate deep investigations into visual quality. The dataset comprises two parts: one with human-labeled annotations for accurate quality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

q-future/q-ground
noneOfficial

Datasets

chaofengc/QGround-100K
dataset· 164 dl
164 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.