Score2Instruct: Scaling Up Video Quality-Centric Instructions via Automated Dimension Scoring

Qizhi Xie; Kun Yuan; Yunpeng Qu; Jiachao Gong; Mingda Wu; Ming Sun; Chao Zhou; Jihong Zhu

arXiv:2506.21011·cs.CV·March 30, 2026

Score2Instruct: Scaling Up Video Quality-Centric Instructions via Automated Dimension Scoring

Qizhi Xie, Kun Yuan, Yunpeng Qu, Jiachao Gong, Mingda Wu, Ming Sun, Chao Zhou, Jihong Zhu

PDF

TL;DR

Score2Instruct introduces an automated, scalable pipeline for generating extensive video quality instruction data, enhancing large multimodal models' ability to assess and justify video quality.

Contribution

It presents a novel automated data generation pipeline and a large instruction dataset that improve video quality assessment and explanation in multimodal models.

Findings

01

Improved video quality scoring accuracy across multiple models.

02

Enhanced ability of models to justify quality assessments.

03

The dataset enables scalable instruction tuning for video LMMs.

Abstract

Classical video quality assessment methods generate a numerical score to judge a video's perceived visual fidelity and clarity. Yet, a score fails to describe the video's complex quality dimensions, restricting its applicability. Benefiting from the human-friendly linguistic output, adapting video large multimodal models to VQA via instruction tuning has the potential to address this issue. The core of the approach lies in the video quality-centric instruction data. Previous explorations mainly focus on the image domain, and their data generation processes heavily rely on human quality annotations and proprietary systems, limiting data scalability and effectiveness. To address these challenges, we propose the Score-based Instruction Generation pipeline. Specifically, SIG first scores multiple quality dimensions of an unlabeled video and maps scores to text-defined levels. It then…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.