TL;DR
This paper introduces DIVA-VQA, a no-reference video quality assessment model that analyzes inter-frame variations in user-generated videos to accurately predict quality without a reference, achieving top performance with low complexity.
Contribution
The paper presents a novel NR-VQA model leveraging inter-frame differences and spatio-temporal fragmentation, improving accuracy and efficiency over existing methods.
Findings
Achieved top 2 ranking on five UGC datasets in correlation metrics.
Outperformed state-of-the-art models in accuracy while maintaining low runtime complexity.
Demonstrated effectiveness of inter-frame variation analysis in perceptual video quality assessment.
Abstract
The rapid growth of user-generated (video) content (UGC) has driven increased demand for research on no-reference (NR) perceptual video quality assessment (VQA). NR-VQA is a key component for large-scale video quality monitoring in social media and streaming applications where a pristine reference is not available. This paper proposes a novel NR-VQA model based on spatio-temporal fragmentation driven by inter-frame variations. By leveraging these inter-frame differences, the model progressively analyses quality-sensitive regions at multiple levels: frames, patches, and fragmented frames. It integrates frames, fragmented residuals, and fragmented frames aligned with residuals to effectively capture global and local information. The model extracts both 2D and 3D features in order to characterize these spatio-temporal variations. Experiments conducted on five UGC datasets and against…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
