Q-Ground: Image Quality Grounding with Large Multi-modality Models
Chaofeng Chen, Sensen Yang, Haoning Wu, Liang Liao, Zicheng Zhang,, Annan Wang, Wenxiu Sun, Qiong Yan, Weisi Lin

TL;DR
Q-Ground introduces a novel framework combining large multi-modality models with a new dataset to enable detailed, region-aware image quality assessment and explanation through text prompts.
Contribution
The paper presents the first approach for fine-scale visual quality grounding using large multi-modality models and introduces the QGround-100K dataset for this purpose.
Findings
Effective multi-scale feature learning for quality assessment
Dual capability for image quality answering and distortion segmentation
Improved robustness with automatically labeled data
Abstract
Recent advances of large multi-modality models (LMM) have greatly improved the ability of image quality assessment (IQA) method to evaluate and explain the quality of visual content. However, these advancements are mostly focused on overall quality assessment, and the detailed examination of local quality, which is crucial for comprehensive visual understanding, is still largely unexplored. In this work, we introduce Q-Ground, the first framework aimed at tackling fine-scale visual quality grounding by combining large multi-modality models with detailed visual quality analysis. Central to our contribution is the introduction of the QGround-100K dataset, a novel resource containing 100k triplets of (image, quality text, distortion segmentation) to facilitate deep investigations into visual quality. The dataset comprises two parts: one with human-labeled annotations for accurate quality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
