iDiff: Interpretable Difference-aware Framework for Pairwise Image Quality Assessment

Xinli Yue; JianHui Sun; Tao Shao; Liangchao Yao; Fan Xia; Yuetang Deng

arXiv:2605.19522·cs.CV·May 20, 2026

iDiff: Interpretable Difference-aware Framework for Pairwise Image Quality Assessment

Xinli Yue, JianHui Sun, Tao Shao, Liangchao Yao, Fan Xia, Yuetang Deng

PDF

TL;DR

iDiff is an interpretable framework for pairwise image quality assessment that combines preference prediction with rationale generation, achieving top results in the NTIRE 2026 RAIM challenge.

Contribution

The paper introduces a dual-branch model that jointly predicts preferences and generates explanations, enhancing robustness and interpretability in image quality assessment.

Findings

01

Achieved first place in NTIRE 2026 RAIM challenge.

02

Effectively models discriminative decision making and structured explanations.

03

Improves both accuracy and reasoning quality in IQA.

Abstract

Pairwise image quality assessment (IQA) in professional photography requires a model not only to identify the preferred image between two candidates, but also to provide convincing and image-grounded reasoning. In the NTIRE 2026 RAIM challenge, this requirement is further emphasized by jointly evaluating preference prediction and rationale generation. To address this task, we propose iDiff, an Interpretable Difference-aware framework for pairwise image quality assessment. Our method adopts a dual-branch design consisting of an Answer Model and a Thinking Model. The Answer Model performs robust preference prediction by explicitly decomposing each sample into left/right global and local views, followed by content-aware specialization for person and scene images and ensemble-based aggregation across backbones. The Thinking Model focuses on rationale generation and is progressively enhanced…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.