PreResQ-R1: Towards Fine-Grained Rank-and-Score Reinforcement Learning for Visual Quality Assessment via Preference-Response Disentangled Policy Optimization
Zehui Feng, Tian Qiu, Tong Wu, Junxuan Li, Huayuan Xu, Ting Han

TL;DR
PreResQ-R1 introduces a reinforcement learning framework that unifies absolute and relative quality assessment, achieving state-of-the-art results in image and video quality benchmarks with interpretable reasoning.
Contribution
It proposes a novel Preference-Response Disentangled RL approach with dual-branch rewards and a new optimization scheme for perceptual quality assessment.
Findings
Achieves state-of-the-art results on 10 IQA and 5 VQA benchmarks.
Surpasses previous methods by 5.30% in IQA and 2.15% in VQA metrics.
Produces human-aligned reasoning traces explaining quality judgments.
Abstract
Visual Quality Assessment (QA) seeks to predict human perceptual judgments of visual fidelity. While recent multimodal large language models (MLLMs) show promise in reasoning about image and video quality, existing approaches mainly rely on supervised fine-tuning or rank-only objectives, resulting in shallow reasoning, poor score calibration, and limited cross-domain generalization. We propose PreResQ-R1, a Preference-Response Disentangled Reinforcement Learning framework that unifies absolute score regression and relative ranking consistency within a single reasoning-driven optimization scheme. Unlike prior QA methods, PreResQ-R1 introduces a dual-branch reward formulation that separately models intra-sample response coherence and inter-sample preference alignment, optimized via Group Relative Policy Optimization (GRPO). This design encourages fine-grained, stable, and interpretable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Video Quality Assessment · Visual Attention and Saliency Detection · Multimodal Machine Learning Applications
