LatRef-Diff: Latent and Reference-Guided Diffusion for Facial Attribute Editing and Style Manipulation
Wenmin Huang, Weiqi Luo, Xiaochun Cao, and Jiwu Huang

TL;DR
LatRef-Diff is a diffusion-based framework that enables precise facial attribute editing and style manipulation using style codes generated via latent and reference guidance, improving control and image quality.
Contribution
It introduces style modulation in diffusion models with novel guidance methods and a training strategy that enhances stability without paired images.
Findings
Achieves state-of-the-art results on CelebA-HQ for facial editing.
Effective style manipulation with high image quality.
Training stability improved through forward-backward consistency.
Abstract
Facial attribute editing and style manipulation are crucial for applications like virtual avatars and photo editing. However, achieving precise control over facial attributes without altering unrelated features is challenging due to the complexity of facial structures and the strong correlations between attributes. While conditional GANs have shown progress, they are limited by accuracy issues and training instability. Diffusion models, though promising, face challenges in style manipulation due to the limited expressiveness of semantic directions. In this paper, we propose LatRef-Diff, a novel diffusion-based framework that addresses these limitations. We replace the traditional semantic directions in diffusion models with style codes and propose two methods for generating them: latent and reference guidance. Based on these style codes, we design a style modulation module that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
