VQAThinker: Exploring Generalizable and Explainable Video Quality Assessment via Reinforcement Learning
Linhan Cao, Wei Sun, Weixia Zhang, Xiangyang Zhu, Jun Jia, Kaiwei Zhang, Dandan Zhu, Guangtao Zhai, Xiongkuo Min

TL;DR
VQAThinker introduces a reinforcement learning-based framework leveraging multimodal models to improve the generalization and explainability of video quality assessment, achieving state-of-the-art results on diverse benchmarks.
Contribution
It proposes a novel reasoning-based VQA framework using reinforcement learning and large multimodal models, with new reward functions for better generalization and explainability.
Findings
Achieves state-of-the-art performance on in-domain and out-of-distribution benchmarks.
Demonstrates superior distortion attribution and quality description capabilities.
Validates reinforcement learning as an effective approach for generalizable and explainable VQA.
Abstract
Video quality assessment (VQA) aims to objectively quantify perceptual quality degradation in alignment with human visual perception. Despite recent advances, existing VQA models still suffer from two critical limitations: \textit{poor generalization to out-of-distribution (OOD) videos} and \textit{limited explainability}, which restrict their applicability in real-world scenarios. To address these challenges, we propose \textbf{VQAThinker}, a reasoning-based VQA framework that leverages large multimodal models (LMMs) with reinforcement learning to jointly model video quality understanding and scoring, emulating human perceptual decision-making. Specifically, we adopt group relative policy optimization (GRPO), a rule-guided reinforcement learning algorithm that enables reasoning over video quality under score-level supervision, and introduce three VQA-specific rewards: (1) a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsImage and Video Quality Assessment · Advanced Image Processing Techniques · Image and Signal Denoising Methods
