Reference-Guided Identity Preserving Face Restoration
Mo Zhou, Keren Ye, Viraj Shah, Kangfu Mei, Mauricio Delbracio, Peyman Milanfar, Vishal M. Patel, Hossein Talebi

TL;DR
This paper presents a novel face restoration method that effectively utilizes reference faces to enhance identity preservation and image quality, introducing a comprehensive reference representation, a new loss function, and an inference adaptation technique.
Contribution
It introduces a composite context representation, a hard example identity loss, and a training-free inference adaptation, advancing reference-guided face restoration.
Findings
Achieves state-of-the-art results on FFHQ-Ref and CelebA-Ref-Test benchmarks.
Outperforms previous methods in identity preservation and image quality.
Demonstrates robustness with multi-reference inputs during inference.
Abstract
Preserving face identity is a critical yet persistent challenge in diffusion-based image restoration. While reference faces offer a path forward, existing reference-based methods often fail to fully exploit their potential. This paper introduces a novel approach that maximizes reference face utility for improved face restoration and identity preservation. Our method makes three key contributions: 1) Composite Context, a comprehensive representation that fuses multi-level (high- and low-level) information from the reference face, offering richer guidance than prior singular representations. 2) Hard Example Identity Loss, a novel loss function that leverages the reference face to address the identity learning inefficiencies found in the existing identity loss. 3) A training-free method to adapt the model to multi-reference inputs during inference. The proposed method demonstrably restores…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
1. The paper is clearly written and well organized. 2. The overall framework is well structured, it effectively incorporates reference facial representations into the restoration pipeline and extends RefLDM models to better preserve identity similarity. 3. The experimental evaluation is comprehensive and detailed, providing strong empirical evidence for the method’s effectiveness.
1. The overall novelty is limited. The proposed framework is conceptually similar to several existing works, and the Hard Example Identity Loss is closely related to the loss function commonly used in [1]. Its main difference lies in the additional use of reference image information, which resembles strategies adopted in recent personalized generation methods[2]. 2. The Composite Context module relies heavily on face recognition features. It would be interesting to explore how the performance ch
1.Well-designed Composite Context that addresses multi-level information fusion. Unlike prior reference-based methods that only leverage partial information from reference faces, this work comprehensively combines: (a) high-level identity features via pre-trained ArcFace embeddings that enforce angular margin constraints; (b) general facial attributes via FaRL including skin texture, lighting, and semantic information; and (c) cross-attention projection through UNet for spatial alignment. The ab
1.Limited novelty in individual technical components While the overall system is effective, each core component builds heavily on existing techniques, the contribution feels more like good engineering than fundamental innovation. 2. Insufficient Analysis of Multi-Reference Degradation Phenomenon Table 2 reveals a counterintuitive and concerning result: identity similarity IDS(REF) sometimes decreases with more reference faces. This directly contradicts the fundamental premise that more referen
Well-Motivated: The core ideas are reasonable in the context of reference-based face restoration. The critique of the "learning inefficiency" of standard identity loss is insightful, and the proposed HID loss is a simple yet effective solution. Empirical Evidence: The paper provides extensive quantitative evaluations on multiple benchmarks (FFHQ-Ref Moderate/Severe, CelebA-Ref-Test), consistently showing superior performance in identity preservation (IDS, FaceNet) while maintaining competitive
Limited technical novelty: While the paper proposes two modules—Composite Context (CC) and Hard Example Identity Loss (HID)—the technical innovations appear incremental. The CC module combines existing face representations (ArcFace and FaRL), which is conceptually similar to prior multi-modal fusion approaches (e.g., SDXL, PGDiff). Though the authors claim to be the first to combine specialized face encoders for restoration, this primarily constitutes an engineering integration rather than a fu
+ The paper identifies the common issue of insufficient identity preservation in diffusion-based face restoration and addresses it via composite context (face recognition and representation) and hard example identity loss. + The two modules are orthogonal, making the approach easily integrable with other LDM backbones. + The authors provide results against both reference-based (RefLDM, RestorerID) and no-reference methods (DiffBIR, CodeFormer), with great metrics and ablations.
- Limited novelty over existing IP-Adapter-like paradigms and other diffusion based face restoration models. The proposed Composite Context essentially acts as a fixed feature adaptor combining ArcFace and FaRL representations. This is conceptually close to IP-Adapter–style feature injection, except with multiple frozen encoders. The claimed advantage of mixing high-level and general representations is not convincingly demonstrated—visual results do not show clear benefits from these two branche
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques
