TL;DR
This paper introduces a unified framework for in-the-wild video quality assessment that leverages mixed datasets training and models human perception effects to improve cross-dataset performance.
Contribution
It proposes a novel unified VQA model with a mixed datasets training strategy and a three-stage framework incorporating human perception principles.
Findings
The mixed datasets training improves cross-dataset generalization.
The unified model outperforms state-of-the-art methods.
The approach effectively models content dependency and temporal-memory effects.
Abstract
Video quality assessment (VQA) is an important problem in computer vision. The videos in computer vision applications are usually captured in the wild. We focus on automatically assessing the quality of in-the-wild videos, which is a challenging problem due to the absence of reference videos, the complexity of distortions, and the diversity of video contents. Moreover, the video contents and distortions among existing datasets are quite different, which leads to poor performance of data-driven methods in the cross-dataset evaluation setting. To improve the performance of quality assessment models, we borrow intuitions from human perception, specifically, content dependency and temporal-memory effects of human visual system. To face the cross-dataset evaluation challenge, we explore a mixed datasets training strategy for training a single VQA model with multiple datasets. The proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMDTVSFA
