CLIPVQA:Video Quality Assessment via CLIP
Fengchuang Xing, Mingjie Li, Yuan-Gen Wang, Guopu Zhu, and Xiaochun, Cao

TL;DR
CLIPVQA introduces a novel CLIP-based Transformer approach for video quality assessment, leveraging rich spatiotemporal features and language descriptions to achieve state-of-the-art performance and improved generalizability across diverse datasets.
Contribution
The paper presents a new CLIP-based Transformer framework for VQA that effectively integrates spatiotemporal features and language descriptions, outperforming existing methods.
Findings
Achieves state-of-the-art VQA performance on eight datasets.
Up to 37% better generalizability than benchmark methods.
Validates effectiveness through comprehensive ablation studies.
Abstract
In learning vision-language representations from web-scale data, the contrastive language-image pre-training (CLIP) mechanism has demonstrated a remarkable performance in many vision tasks. However, its application to the widely studied video quality assessment (VQA) task is still an open issue. In this paper, we propose an efficient and effective CLIP-based Transformer method for the VQA problem (CLIPVQA). Specifically, we first design an effective video frame perception paradigm with the goal of extracting the rich spatiotemporal quality and content information among video frames. Then, the spatiotemporal quality features are adequately integrated together using a self-attention mechanism to yield video-level quality representation. To utilize the quality language descriptions of videos for supervision, we develop a CLIP-based encoder for language embedding, which is then fully…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Image Retrieval and Classification Techniques · Advanced Data Compression Techniques
MethodsLinear Layer · Multi-Head Attention · Attention Is All You Need · Softmax · Byte Pair Encoding · Layer Normalization · Concatenated Skip Connection · Label Smoothing · Absolute Position Encodings · Position-Wise Feed-Forward Layer
