R4-CGQA: Retrieval-based Vision Language Models for Computer Graphics Image Quality Assessment
Zhuangzi Li, Jian Jin, Shilv Cai, Weisi Lin

TL;DR
This paper introduces R4-CGQA, a retrieval-augmented vision language model framework that improves computer graphics image quality assessment by leveraging a new dataset and question-answer benchmarks, addressing the limitations of existing methods.
Contribution
The paper constructs a new dataset with quality descriptions for CG images and develops a retrieval-based framework to enhance VLMs' ability to assess CG quality accurately.
Findings
Current VLMs struggle with fine-grained CG quality judgment.
Retrieval-augmented generation significantly improves assessment performance.
Descriptions of similar images boost VLM understanding of CG quality.
Abstract
Immersive Computer Graphics (CGs) rendering has become ubiquitous in modern daily life. However, comprehensively evaluating CG quality remains challenging for two reasons: First, existing CG datasets lack systematic descriptions of rendering quality; and second existing CG quality assessment methods cannot provide reasonable text-based explanations. To address these issues, we first identify six key perceptual dimensions of CG quality from the user perspective and construct a dataset of 3500 CG images with corresponding quality descriptions. Each description covers CG style, content, and perceived quality along the selected dimensions. Furthermore, we use a subset of the dataset to build several question-answer benchmarks based on the descriptions in order to evaluate the responses of existing Vision Language Models (VLMs). We find that current VLMs are not sufficiently accurate in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Visual Attention and Saliency Detection · Image and Video Quality Assessment
