Improving Generative Adversarial Networks with Self-Distillation

Antoni Nowinowski; Krzysztof Krawiec

arXiv:2605.08577·cs.CV·May 12, 2026

Improving Generative Adversarial Networks with Self-Distillation

Antoni Nowinowski, Krzysztof Krawiec

PDF

TL;DR

This paper introduces SD-GAN, a novel training method that uses the EMA generator as a teacher to improve GAN stability and image quality through self-distillation.

Contribution

It proposes a self-distillation approach for GANs that leverages the EMA generator during training, enhancing stability and final image quality.

Findings

01

SD-GAN improves FID and random-FID scores across datasets.

02

It stabilizes GAN training and reduces parasitic cycling.

03

Effective for fine-tuning pretrained GANs.

Abstract

In modern GANs, maintaining an Exponential Moving Average (EMA) of the generator's weights is a standard practice, as such an averaged model consistently outperforms the actively trained generator. However, the EMA generator is used for final deployment only and does not influence the training process. To address this missed opportunity, we introduce Self-Distilled GAN (SD-GAN) that employs the EMA generator as a teacher to guide the active generator (student) via perceptual loss. We prove the local asymptotic stability of SD-GAN in the Dirac-GAN setting and show that it dampens the parasitic cycling behavior that plagues the conventional GANs. Empirical evaluations across established architectures and datasets demonstrate that SD-GAN improves the final image quality on several metrics (FID and random-FID in particular), stabilizes the optimization trajectory and provides additional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.