When Few Steps Are Enough: Training-Free Acceleration of Identity-Preserved Generation
Dongqi Zheng

TL;DR
This paper demonstrates that replacing the diffusion backbone with a distilled, frozen identity adapter enables fast, high-fidelity identity-preserved image generation without additional training, reducing latency significantly.
Contribution
It introduces a training-free backbone replacement method that maintains identity fidelity and improves efficiency in identity-preserved image generation.
Findings
Identity fidelity enters an early effective regime within 4-8 steps.
Replacing the backbone reduces latency by 5.9x and improves identity similarity metrics.
Early steps primarily refine visual details, with diminishing returns in later steps.
Abstract
Identity-preserved image generation is typically built on many-step diffusion backbones, making personalized generation expensive at deployment time. We show that this cost is often unnecessary for identity-conditioned FLUX generation. A frozen InfuseNet identity adapter trained with dev transfers directly to the distilled schnell backbone without retraining. This two-line replacement -- changing the backbone path and disabling classifier-free guidance -- reduces latency by 5.9x while improving ArcFace identity similarity by +0.028 and lpips by -0.016 over the standard 28-step dev baseline. To explain why this works, we analyze the denoising trajectory and find that identity fidelity enters an early effective regime, often within 4-8 steps, while later steps primarily refine visual detail, sharpness, and contrast. Adapter ablations confirm that identity formation depends on the identity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
