ExpertGen: Training-Free Expert Guidance for Controllable Text-to-Face Generation

Liang Shi; Yun Fu

arXiv:2505.17256·cs.CV·May 26, 2025

ExpertGen: Training-Free Expert Guidance for Controllable Text-to-Face Generation

Liang Shi, Yun Fu

PDF

TL;DR

ExpertGen introduces a training-free method that uses pre-trained expert models to achieve fine-grained, simultaneous control over facial features in diffusion-based face generation, without additional training.

Contribution

It leverages pre-trained expert models as guidance signals in a diffusion framework, enabling flexible, multi-attribute face generation without extra training modules.

Findings

01

Expert guidance improves control accuracy in face generation.

02

Multiple experts can collaborate for multi-attribute control.

03

The method is flexible and resource-efficient.

Abstract

Recent advances in diffusion models have significantly improved text-to-face generation, but achieving fine-grained control over facial features remains a challenge. Existing methods often require training additional modules to handle specific controls such as identity, attributes, or age, making them inflexible and resource-intensive. We propose ExpertGen, a training-free framework that leverages pre-trained expert models such as face recognition, facial attribute recognition, and age estimation networks to guide generation with fine control. Our approach uses a latent consistency model to ensure realistic and in-distribution predictions at each diffusion step, enabling accurate guidance signals to effectively steer the diffusion process. We show qualitatively and quantitatively that expert models can guide the generation process with high precision, and multiple experts can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.