Deep Local and Global Spatiotemporal Feature Aggregation for Blind Video Quality Assessment
Wei Zhou, Zhibo Chen

TL;DR
This paper introduces DeepSTQ, a no-reference video quality assessment method that leverages pre-trained deep models to extract and aggregate local and global spatiotemporal features, outperforming existing algorithms.
Contribution
It proposes a novel deep learning-based VQA approach using pre-trained models for feature extraction without training from scratch.
Findings
DeepSTQ outperforms state-of-the-art VQA algorithms.
Utilizes pre-trained models for efficient feature extraction.
Effective in assessing various distorted videos.
Abstract
In recent years, deep learning has achieved promising success for multimedia quality assessment, especially for image quality assessment (IQA). However, since there exist more complex temporal characteristics in videos, very little work has been done on video quality assessment (VQA) by exploiting powerful deep convolutional neural networks (DCNNs). In this paper, we propose an efficient VQA method named Deep SpatioTemporal video Quality assessor (DeepSTQ) to predict the perceptual quality of various distorted videos in a no-reference manner. In the proposed DeepSTQ, we first extract local and global spatiotemporal features by pre-trained deep learning models without fine-tuning or training from scratch. The composited features consider distorted video frames as well as frame difference maps from both global and local views. Then, the feature aggregation is conducted by the regression…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Video Quality Assessment · Advanced Image Processing Techniques · Advanced Image Fusion Techniques
