TL;DR
Refine-IQA introduces a multi-stage reinforcement fine-tuning framework that enhances perceptual image quality assessment by explicitly improving the model's visual perception and interpretative capabilities, leading to superior performance.
Contribution
The paper proposes a novel multi-stage RFT framework with a new dataset and reward functions to improve low-level visual perception and interpretability in IQA models.
Findings
Achieves outstanding performance on perception and scoring tasks.
Activates robust 'think' (interpretation) capabilities in the model.
Excels on the quality interpreting benchmark.
Abstract
Reinforcement fine-tuning (RFT) is a proliferating paradigm for LMM training. Analogous to high-level reasoning tasks, RFT is similarly applicable to low-level vision domains, including image quality assessment (IQA). Existing RFT-based IQA methods typically use rule-based output rewards to verify the model's rollouts but provide no reward supervision for the "think" process, leaving its correctness and efficacy uncontrolled. Furthermore, these methods typically fine-tune directly on downstream IQA tasks without explicitly enhancing the model's native low-level visual quality perception, which may constrain its performance upper bound. In response to these gaps, we propose the multi-stage RFT IQA framework (Refine-IQA). In Stage-1, we build the Refine-Perception-20K dataset (with 12 main distortions, 20,907 locally-distorted images, and over 55K RFT samples) and design multi-task reward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
