DiffusionReward: Enhancing Blind Face Restoration through Reward Feedback Learning

Bin Wu; Wei Wang; Yahui Liu; Zixiang Li; Yao Zhao

arXiv:2505.17910·cs.CV·May 26, 2025

DiffusionReward: Enhancing Blind Face Restoration through Reward Feedback Learning

Bin Wu, Wei Wang, Yahui Liu, Zixiang Li, Yao Zhao

PDF

3 Reviews

TL;DR

DiffusionReward introduces a reward feedback learning framework for blind face restoration, leveraging a trained Face Reward Model to improve facial detail realism and identity consistency over existing diffusion-based methods.

Contribution

This work is the first to apply reward feedback learning to blind face restoration, enhancing diffusion models with a dynamic reward system for better results.

Findings

01

Outperforms state-of-the-art methods in identity consistency

02

Improves facial detail realism in restored images

03

Maintains generative diversity and facial fidelity

Abstract

Reward Feedback Learning (ReFL) has recently shown great potential in aligning model outputs with human preferences across various generative tasks. In this work, we introduce a ReFL framework, named DiffusionReward, to the Blind Face Restoration task for the first time. DiffusionReward effectively overcomes the limitations of diffusion-based methods, which often fail to generate realistic facial details and exhibit poor identity consistency. The core of our framework is the Face Reward Model (FRM), which is trained using carefully annotated data. It provides feedback signals that play a pivotal role in steering the optimization process of the restoration network. In particular, our ReFL framework incorporates a gradient flow into the denoising process of off-the-shelf face restoration methods to guide the update of model parameters. The guiding gradient is collaboratively determined by…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 3

Strengths

Novelty and Significance: 1) This is the first work to adapt ReFL for BFR, bridging a gap between generative alignment and restoration tasks. The dynamic FRM update strategy effectively mitigates reward hacking, a common pitfall in reward-based optimization. 2) The hybrid annotation pipeline (combining human labels with SVM-based automation) for FRM training is resource-efficient and scalable. Technical Rigor: 1) The framework incorporates multiple loss terms (reward loss, structural consist

Weaknesses

Clarity of FRM Training Details: The description of the SVM-based automated annotation (Section 3.1) is concise but lacks critical specifics (e.g., kernel choice, feature normalization). Please add a brief summary in the main text or refer to Appendix A.1 for clarity. Limitations of Generalizability: The framework is only validated on diffusion-based models (DiffBIR/OSEDiff). The ablation in Appendix G.1 shows limited gains when applied to GFPGAN (GAN-based). This should be explicitly discussed

Reviewer 02Rating 4Confidence 4

Strengths

1. A dynamic reward update strategy to mitigate reward hacking; 2. Structural consistency and weight-regularization constraints to preserve identity and maintain generative diversity; 3. End-to-end ReFL training integrated into the denoising process.

Weaknesses

1. The core contribution of this work is to introduce the reward feedback mechanism into BFR. However, the experimental results do not clearly demonstrate the necessity and advantages of the reward model. Specifically focusing on the ablation studies of Table 4, i) the LMD improvement mainly comes from the structural consistency loss (see from Base and Variant 1); ii) the MUSIQ and Aesthetic improvement is primarily controlled by KL weight regularization (see from Variant 3 and Ours) as this KL

Reviewer 03Rating 6Confidence 3

Strengths

1. Effective Adaptation: The proposed application of Reward Feedback Learning (ReFL) to Blind Face Restoration (BFR) is novel and compelling. It presents a computationally efficient fine-tuning strategy that demonstrably enhances the performance of powerful pre-trained diffusion models, as evidenced by the clear improvements in both visual quality and quantitative metrics reported. 2. Well-Designed Reward Model: The construction of the Face Reward Model (FRM) is a significant contribution.

Weaknesses

1. Potential Cherry-Picking of Qualitative Results: A common concern in image restoration is the potential for cherry-picking qualitative results. While the provided examples are compelling, it would significantly strengthen the validity of the claims if the supplementary material included a more extensive set of randomly selected samples (e.g., the first 1-20 images from a standard benchmark like FFHQ) comparing the base models (OSEDiff, DiffBIR) against their fine-tuned versions with the pro

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.