Towards Explainable In-the-Wild Video Quality Assessment: A Database and a Language-Prompted Approach
Haoning Wu, Erli Zhang, Liang Liao, Chaofeng Chen, Jingwen Hou, Annan, Wang, Wenxiu Sun, Qiong Yan, Weisi Lin

TL;DR
This paper introduces a multi-dimensional database for in-the-wild video quality assessment and proposes a language-prompted model, MaxVQA, that jointly evaluates specific quality factors and overall video quality with high accuracy.
Contribution
The paper presents the Maxwell database with detailed quality factor annotations and a novel CLIP-based language-prompted VQA method, MaxVQA, for comprehensive quality evaluation.
Findings
MaxVQA achieves state-of-the-art accuracy across all quality dimensions.
The Maxwell database enables detailed analysis of quality factors and their relation to subjective scores.
MaxVQA generalizes well to existing datasets, demonstrating robustness.
Abstract
The proliferation of in-the-wild videos has greatly expanded the Video Quality Assessment (VQA) problem. Unlike early definitions that usually focus on limited distortion types, VQA on in-the-wild videos is especially challenging as it could be affected by complicated factors, including various distortions and diverse contents. Though subjective studies have collected overall quality scores for these videos, how the abstract quality scores relate with specific factors is still obscure, hindering VQA methods from more concrete quality evaluations (e.g. sharpness of a video). To solve this problem, we collect over two million opinions on 4,543 in-the-wild videos on 13 dimensions of quality-related factors, including in-capture authentic distortions (e.g. motion blur, noise, flicker), errors introduced by compression and transmission, and higher-level experiences on semantic contents and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Video Quality Assessment · Advanced Image Processing Techniques · Visual Attention and Saliency Detection
MethodsContrastive Language-Image Pre-training
