TL;DR
DiffusionReward introduces a reward feedback learning framework for blind face restoration, leveraging a trained Face Reward Model to improve facial detail realism and identity consistency over existing diffusion-based methods.
Contribution
This work is the first to apply reward feedback learning to blind face restoration, enhancing diffusion models with a dynamic reward system for better results.
Findings
Outperforms state-of-the-art methods in identity consistency
Improves facial detail realism in restored images
Maintains generative diversity and facial fidelity
Abstract
Reward Feedback Learning (ReFL) has recently shown great potential in aligning model outputs with human preferences across various generative tasks. In this work, we introduce a ReFL framework, named DiffusionReward, to the Blind Face Restoration task for the first time. DiffusionReward effectively overcomes the limitations of diffusion-based methods, which often fail to generate realistic facial details and exhibit poor identity consistency. The core of our framework is the Face Reward Model (FRM), which is trained using carefully annotated data. It provides feedback signals that play a pivotal role in steering the optimization process of the restoration network. In particular, our ReFL framework incorporates a gradient flow into the denoising process of off-the-shelf face restoration methods to guide the update of model parameters. The guiding gradient is collaboratively determined by…
Peer Reviews
Decision·Submitted to ICLR 2026
Novelty and Significance: 1) This is the first work to adapt ReFL for BFR, bridging a gap between generative alignment and restoration tasks. The dynamic FRM update strategy effectively mitigates reward hacking, a common pitfall in reward-based optimization. 2) The hybrid annotation pipeline (combining human labels with SVM-based automation) for FRM training is resource-efficient and scalable. Technical Rigor: 1) The framework incorporates multiple loss terms (reward loss, structural consist
Clarity of FRM Training Details: The description of the SVM-based automated annotation (Section 3.1) is concise but lacks critical specifics (e.g., kernel choice, feature normalization). Please add a brief summary in the main text or refer to Appendix A.1 for clarity. Limitations of Generalizability: The framework is only validated on diffusion-based models (DiffBIR/OSEDiff). The ablation in Appendix G.1 shows limited gains when applied to GFPGAN (GAN-based). This should be explicitly discussed
1. A dynamic reward update strategy to mitigate reward hacking; 2. Structural consistency and weight-regularization constraints to preserve identity and maintain generative diversity; 3. End-to-end ReFL training integrated into the denoising process.
1. The core contribution of this work is to introduce the reward feedback mechanism into BFR. However, the experimental results do not clearly demonstrate the necessity and advantages of the reward model. Specifically focusing on the ablation studies of Table 4, i) the LMD improvement mainly comes from the structural consistency loss (see from Base and Variant 1); ii) the MUSIQ and Aesthetic improvement is primarily controlled by KL weight regularization (see from Variant 3 and Ours) as this KL
1. Effective Adaptation: The proposed application of Reward Feedback Learning (ReFL) to Blind Face Restoration (BFR) is novel and compelling. It presents a computationally efficient fine-tuning strategy that demonstrably enhances the performance of powerful pre-trained diffusion models, as evidenced by the clear improvements in both visual quality and quantitative metrics reported. 2. Well-Designed Reward Model: The construction of the Face Reward Model (FRM) is a significant contribution.
1. Potential Cherry-Picking of Qualitative Results: A common concern in image restoration is the potential for cherry-picking qualitative results. While the provided examples are compelling, it would significantly strengthen the validity of the claims if the supplementary material included a more extensive set of randomly selected samples (e.g., the first 1-20 images from a standard benchmark like FFHQ) comparing the base models (OSEDiff, DiffBIR) against their fine-tuned versions with the pro
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
