Optimization Benchmark for Diffusion Models on Dynamical Systems
Fabian Schaipp

TL;DR
This paper benchmarks various optimization algorithms for training diffusion models on dynamical systems, revealing that Muon and SOAP outperform AdamW and analyzing training dynamics and optimizer performance.
Contribution
It introduces a comprehensive benchmark for diffusion model optimization algorithms and compares their efficiency, highlighting Muon and SOAP as superior alternatives to AdamW.
Findings
Muon and SOAP are 18% more efficient than AdamW.
Learning-rate schedules significantly affect training dynamics.
There is a notable performance gap between Adam and SGD in this context.
Abstract
The training of diffusion models is often absent in the evaluation of new optimization techniques. In this work, we benchmark recent optimization algorithms for training a diffusion model for denoising flow trajectories. We observe that Muon and SOAP are highly efficient alternatives to AdamW (18% lower final loss). We also revisit several recent phenomena related to the training of models for text or image applications in the context of diffusion model training. This includes the impact of the learning-rate schedule on the training dynamics, and the performance gap between Adam and SGD.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Medical Image Segmentation Techniques · COVID-19 diagnosis using AI
