TL;DR
This paper demonstrates that Fréchet Distance can be effectively optimized in representation space to improve visual generation quality and proposes a multi-representation metric for better evaluation.
Contribution
It introduces FD-loss, decoupling population and batch sizes for FD optimization, and shows its effectiveness in enhancing generator performance and evaluation.
Findings
Optimizing FD-loss improves visual quality across representation spaces.
FD-loss enables multi-step generators to become strong one-step generators.
Modern representations can yield better samples despite worse FID scores.
Abstract
We show that Fr\'echet Distance (FD), long considered impractical as a training objective, can in fact be effectively optimized in the representation space. Our idea is simple: decouple the population size for FD estimation (e.g., 50k) from the batch size for gradient computation (e.g., 1024). We term this approach FD-loss. Optimizing FD-loss reveals several surprising findings. First, post-training a base generator with FD-loss in different representation spaces consistently improves visual quality. Under the Inception feature space, a one-step generator achieves0.72 FID on ImageNet 256x256. Second, the same FD-loss repurposes multi-step generators into strong one-step generators without teacher distillation, adversarial training or per-sample targets. Third, FID can misrank visual quality: modern representations can yield better samples despite worse Inception FID. This motivates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
