CAMP-VQA: Caption-Embedded Multimodal Perception for No-Reference Quality Assessment of Compressed Video
Xinyi Wang, Angeliki Katsenou, Junxiao Shen, David Bull

TL;DR
CAMP-VQA introduces a novel no-reference video quality assessment framework leveraging large vision-language models and semantic understanding to accurately predict perceived video quality without manual annotations.
Contribution
It proposes a new multimodal approach that integrates video metadata and inter-frame variations for fine-grained quality assessment using large vision-language models.
Findings
Outperforms existing NR-VQA methods on UGC datasets.
Achieves high correlation scores (SRCC: 0.928, PLCC: 0.938).
Demonstrates effectiveness without manual fine-grained annotations.
Abstract
The prevalence of user-generated content (UGC) on platforms such as YouTube and TikTok has rendered no-reference (NR) perceptual video quality assessment (VQA) vital for optimizing video delivery. Nonetheless, the characteristics of non-professional acquisition and the subsequent transcoding of UGC video on sharing platforms present significant challenges for NR-VQA. Although NR-VQA models attempt to infer mean opinion scores (MOS), their modeling of subjective scores for compressed content remains limited due to the absence of fine-grained perceptual annotations of artifact types. To address these challenges, we propose CAMP-VQA, a novel NR-VQA framework that exploits the semantic understanding capabilities of large vision-language models. Our approach introduces a quality-aware prompting mechanism that integrates video metadata (e.g., resolution, frame rate, bitrate) with key…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Video Quality Assessment · Video Analysis and Summarization · Visual Attention and Saliency Detection
