CAMP-VQA: Caption-Embedded Multimodal Perception for No-Reference Quality Assessment of Compressed Video

Xinyi Wang; Angeliki Katsenou; Junxiao Shen; David Bull

arXiv:2511.07290·eess.IV·November 11, 2025

CAMP-VQA: Caption-Embedded Multimodal Perception for No-Reference Quality Assessment of Compressed Video

Xinyi Wang, Angeliki Katsenou, Junxiao Shen, David Bull

PDF

Open Access 1 Models

TL;DR

CAMP-VQA introduces a novel no-reference video quality assessment framework leveraging large vision-language models and semantic understanding to accurately predict perceived video quality without manual annotations.

Contribution

It proposes a new multimodal approach that integrates video metadata and inter-frame variations for fine-grained quality assessment using large vision-language models.

Findings

01

Outperforms existing NR-VQA methods on UGC datasets.

02

Achieves high correlation scores (SRCC: 0.928, PLCC: 0.938).

03

Demonstrates effectiveness without manual fine-grained annotations.

Abstract

The prevalence of user-generated content (UGC) on platforms such as YouTube and TikTok has rendered no-reference (NR) perceptual video quality assessment (VQA) vital for optimizing video delivery. Nonetheless, the characteristics of non-professional acquisition and the subsequent transcoding of UGC video on sharing platforms present significant challenges for NR-VQA. Although NR-VQA models attempt to infer mean opinion scores (MOS), their modeling of subjective scores for compressed content remains limited due to the absence of fine-grained perceptual annotations of artifact types. To address these challenges, we propose CAMP-VQA, a novel NR-VQA framework that exploits the semantic understanding capabilities of large vision-language models. Our approach introduces a quality-aware prompting mechanism that integrates video metadata (e.g., resolution, frame rate, bitrate) with key…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
xinyiW915/CAMP-VQA
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Video Quality Assessment · Video Analysis and Summarization · Visual Attention and Saliency Detection