Latent Posterior-Mean Rectified Flow for Higher-Fidelity Perceptual Face Restoration
Xin Luo, Menglin Zhang, Yunwei Lan, Tianyu Zhang, Rui Li, Chang Liu, Dong Liu

TL;DR
This paper introduces Latent-PMRF, a novel face restoration method that reformulates PMRF in the VAE latent space, improving perceptual quality, fidelity, and efficiency in blind face restoration tasks.
Contribution
It proposes a latent space reformulation of PMRF using a VAE, enhancing alignment with human perception and significantly improving restoration performance and speed.
Findings
Outperforms existing methods in perceptual quality and fidelity.
Achieves 5.79X faster convergence in FID score.
Demonstrates superior PD-tradeoff in blind face restoration.
Abstract
The Perception-Distortion tradeoff (PD-tradeoff) theory suggests that face restoration algorithms must balance perceptual quality and fidelity. To achieve minimal distortion while maintaining perfect perceptual quality, Posterior-Mean Rectified Flow (PMRF) proposes a flow based approach where source distribution is minimum distortion estimations. Although PMRF is shown to be effective, its pixel-space modeling approach limits its ability to align with human perception, where human perception is defined as how humans distinguish between two image distributions. In this work, we propose Latent-PMRF, which reformulates PMRF in the latent space of a variational autoencoder (VAE), facilitating better alignment with human perception during optimization. By defining the source distribution on latent representations of minimum distortion estimation, we bound the minimum distortion by the VAE's…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
+ Enhanced Perceptual Alignment and Efficiency: By reformulating the PMRF model in a learned latent space rather than pixel space, the method achieves superior alignment with human perception, leading to a significant 5.79× speedup in convergence as measured by FID. + Theoretically-Bounded Distortion with a Superior VAE: The novel design of the source distribution in Latent-PMRF guarantees that the minimum achievable distortion is bounded by the VAE's reconstruction error. This advantage is furt
- There are no ablation studies to verify the effect of the Latent-PMRF and Sim-VAE, respectively. - The organization is ugly. In the abstract, I do know what the meaning of the Sim-VAE. - When compared with other methods, some performance metrics are not always the best. The authors do not provide detailed explanations. Additionally, the runtimes and the number of parameters do not outperform existing methods. - The datasets to be verified are simple. There is a simple face in one image. In a
1. Latent-PMRF leverages the VAE's latent space, enabling more effective optimization aligned with human perception. This results in better perceptual quality and faster convergence compared to PMRF in pixel space. 2. The reformulation of PMRF in latent space, combined with the use of a Sim-VAE that enhances VAE reconstruction fidelity, provides a significant boost to restoration performance, addressing challenges in face restoration tasks effectively. 3. The method accelerates the training pr
1. The removal of self-attention in Sim-VAE’s middle layers, while improving resolution generalization, may sacrifice the model’s ability to capture long-range contextual dependencies in facial images, limiting performance on complex facial structures or textures. 2. Sim-VAE’s training relies on a fixed combination of LSDIR and FFHQ datasets, lacking validation on diverse data distributions (e.g., low-light, extreme poses) which may hinder its generalization to real-world edge cases. 3. Despite
1. The paper proposes Latent-PMRF, which implements Posterior-Mean Rectified Flow (PMRF) in VAE latent space, motivated by the better alignment of feature-space distances with human perception compared to pixel space. 2. It distinguishes between two latent-space source distributions—latent of the posterior mean vs. posterior mean of the latent—and shows that the former preserves the minimum-distortion property, with theoretical support. 3. The authors introduce Sim-VAE, a streamlined variant of
1. The inference speed of Latent-PMRF is slightly slower than PMRF due to VAE encoding/decoding, but the tradeoff between this overhead and perceptual gains is not quantitatively analyzed. 2. While Sim-VAE improves performance, the ablation study lacks qualitative analysis (e.g., feature map visualization) to explain why certain architectural changes—like replacing GroupNorm with LayerNorm—are beneficial. 3. The comparison with concurrent work ELIR is indirect; a direct experimental comparison w
1. The authors propose a new architecture for VAEs, which yields much lower distortion when encoding-decoding face images, compared to the VAEs that are typically used by diffusion models. 2. The proposed Latent-PMRF can achieve similar performance to PMRF, yet it requires a much lower training time. This is reasonable since the authors pre-train a VAE. 3. The paper is overall well written and very easy to follow. 4. The proposed solution is somewhat original: most diffusion-based image restora
1. While Latent-PMRF requires lower training times, it still relies on a VAE tailored for face images, which by itself requires training. If the sole purpose of the VAE is to serve the Latent-PMRF model (because the trained Sim-VAE is designed for face images), then it's not immediately clear whether Latent-PMRF requires less training -- the authors must also report the training times of Sim-VAE, and add them to those of Latent-PMRF. However, if Latent-PMRF can also work well for general-purpose
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Advanced Image Processing Techniques · Generative Adversarial Networks and Image Synthesis
