Comment on "No-Reference Video Quality Assessment Based on the Temporal Pooling of Deep Features"
Franz G\"otz-Hahn, Vlad Hosu, Dietmar Saupe

TL;DR
This paper critically reexamines a neural network-based video quality assessment method, revealing that previous performance claims were inflated due to data leakage issues, emphasizing the importance of proper evaluation protocols.
Contribution
The authors identify data leakage as a key flaw in prior work and provide a careful reimplementation that demonstrates more accurate, lower performance results.
Findings
Original performance claims were inflated due to data leakage
Proper evaluation reduces the reported performance to below state-of-the-art levels
Highlights the importance of rigorous validation in machine learning for video quality assessment
Abstract
In Neural Processing Letters 50,3 (2019) a machine learning approach to blind video quality assessment was proposed. It is based on temporal pooling of features of video frames, taken from the last pooling layer of deep convolutional neural networks. The method was validated on two established benchmark datasets and gave results far better than the previous state-of-the-art. In this letter we report the results from our careful reimplementations. The performance results, claimed in the paper, cannot be reached, and are even below the state-of-the-art by a large margin. We show that the originally reported wrong performance results are a consequence of two cases of data leakage. Information from outside the training dataset was used in the fine-tuning stage and in the model evaluation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Video Quality Assessment · Digital Media Forensic Detection · Advanced Image Processing Techniques
