Learning Perceptual Representations for Gaming NR-VQA with Multi-Task FR Signals
Yu-Chih Chen, Michael Wang, Chieh-Dun Wen, Kai-Siang Ma, Avinab Saha, Li-Heng Chen, Alan Bovik

TL;DR
This paper introduces MTL-VQA, a multi-task learning framework that leverages full-reference metrics as supervisory signals to improve no-reference video quality assessment for gaming videos, especially when human-rated data is scarce.
Contribution
The novel approach uses multi-task learning with adaptive task weighting to learn perceptual features without human labels, enhancing NR-VQA performance on gaming videos.
Findings
MTL-VQA achieves competitive results with state-of-the-art methods.
It performs well in both supervised and self-supervised settings.
The framework effectively transfers learned features to NR-VQA tasks.
Abstract
No-reference video quality assessment (NR-VQA) for gaming videos is challenging due to limited human-rated datasets and unique content characteristics including fast motion, stylized graphics, and compression artifacts. We present MTL-VQA, a multi-task learning framework that uses full-reference metrics as supervisory signals to learn perceptually meaningful features without human labels for pretraining. By jointly optimizing multiple full-reference (FR) objectives with adaptive task weighting, our approach learns shared representations that transfer effectively to NR-VQA. Experiments on gaming video datasets show MTL-VQA achieves performance competitive with state-of-the-art NR-VQA methods across both MOS-supervised and label-efficient/self-supervised settings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Video Quality Assessment · Visual Attention and Saliency Detection · Human Pose and Action Recognition
