Sparsely Supervised Diffusion
Wenshuai Zhao, Zhiyuan Li, Yi Zhao, Mohammad Hassan Vali, Martin Trapp, Joni Pajarinen, Juho Kannala, Arno Solin

TL;DR
This paper introduces a sparsely supervised learning approach for diffusion models using a masking strategy, which improves global consistency, reduces memorization, and maintains competitive quality even with high levels of pixel masking.
Contribution
It presents a simple masking-based training method for diffusion models that enhances global coherence and stability, especially on small datasets.
Findings
Masking up to 98% of pixels is safe during training.
The method achieves competitive FID scores across experiments.
It prevents training instability on small datasets.
Abstract
Diffusion models have shown remarkable success across a wide range of generative tasks. However, they often suffer from spatially inconsistent generation, arguably due to the inherent locality of their denoising mechanisms. This can yield samples that are locally plausible but globally inconsistent. To mitigate this issue, we propose sparsely supervised learning for diffusion models, a simple yet effective masking strategy that can be implemented with only a few lines of code. Interestingly, the experiments show that it is safe to mask up to 98\% of pixels during diffusion model training. Our method delivers competitive FID scores across experiments and, most importantly, avoids training instability on small datasets. Moreover, the masking strategy reduces memorization and promotes the use of essential contextual information during generation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques
