TL;DR
This paper introduces enhanced video quality assessment models, SpatioTemporal VMAF and Ensemble VMAF, which incorporate temporal features for improved accuracy while maintaining computational efficiency, building on the existing VMAF framework.
Contribution
The paper proposes two novel models that better exploit temporal video features within the VMAF framework for more accurate quality assessment.
Findings
Both models outperform existing approaches on a large subjective database.
Models are computationally efficient and suitable for real-world deployment.
Open-source implementation is provided for community use.
Abstract
Perceptual video quality assessment models are either frame-based or video-based, i.e., they apply spatiotemporal filtering or motion estimation to capture temporal video distortions. Despite their good performance on video quality databases, video-based approaches are time-consuming and harder to efficiently deploy. To balance between high performance and computational efficiency, Netflix developed the Video Multi-method Assessment Fusion (VMAF) framework, which integrates multiple quality-aware features to predict video quality. Nevertheless, this fusion framework does not fully exploit temporal video quality measurements which are relevant to temporal video distortions. To this end, we propose two improvements to the VMAF framework: SpatioTemporal VMAF and Ensemble VMAF. Both algorithms exploit efficient temporal video features which are fed into a single or multiple regression…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
