Scaling-up Perceptual Video Quality Assessment

Ziheng Jia; Zicheng Zhang; Zeyu Zhang; Yingji Liang; Xiaorong Zhu; Chunyi Li; Jinliang Han; Haoning Wu; Bin Wang; Haoran Zhang; Guanyu Zhu; Qiyong Zhao; Xiaohong Liu; Guangtao Zhai; Xiongkuo Min

arXiv:2505.22543·cs.CV·May 29, 2025

Scaling-up Perceptual Video Quality Assessment

Ziheng Jia, Zicheng Zhang, Zeyu Zhang, Yingji Liang, Xiaorong Zhu, Chunyi Li, Jinliang Han, Haoning Wu, Bin Wang, Haoran Zhang, Guanyu Zhu, Qiyong Zhao, Xiaohong Liu, Guangtao Zhai, Xiongkuo Min

PDF

Open Access 1 Video

TL;DR

This paper introduces OmniVQA, a scalable framework for perceptual video quality assessment that leverages large, human-in-the-loop datasets and a novel training strategy to achieve state-of-the-art results.

Contribution

The paper presents OmniVQA, the largest MIDB for VQA, along with a fine-grained benchmark and a complementary training strategy to improve quality understanding and rating.

Findings

01

Achieved state-of-the-art performance in quality understanding.

02

Achieved state-of-the-art performance in quality rating.

03

Created the largest VQA multi-modal instruction database.

Abstract

The data scaling law has been shown to significantly enhance the performance of large multi-modal models (LMMs) across various downstream tasks. However, in the domain of perceptual video quality assessment (VQA), the potential of scaling law remains unprecedented due to the scarcity of labeled resources and the insufficient scale of datasets. To address this, we propose \textbf{OmniVQA}, an efficient framework designed to efficiently build high-quality, human-in-the-loop VQA multi-modal instruction databases (MIDBs). We then scale up to create \textbf{OmniVQA-Chat-400K}, the largest MIDB in the VQA field concurrently. Our focus is on the technical and aesthetic quality dimensions, with abundant in-context instruction data to provide fine-grained VQA knowledge. Additionally, we have built the \textbf{OmniVQA-MOS-20K} dataset to enhance the model's quantitative quality rating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Scaling-up Perceptual Video Quality Assessment· underline

Taxonomy

TopicsImage and Video Quality Assessment