Optimization Benchmark for Diffusion Models on Dynamical Systems

Fabian Schaipp

arXiv:2510.19376·cs.LG·October 23, 2025

Optimization Benchmark for Diffusion Models on Dynamical Systems

Fabian Schaipp

PDF

Open Access

TL;DR

This paper benchmarks various optimization algorithms for training diffusion models on dynamical systems, revealing that Muon and SOAP outperform AdamW and analyzing training dynamics and optimizer performance.

Contribution

It introduces a comprehensive benchmark for diffusion model optimization algorithms and compares their efficiency, highlighting Muon and SOAP as superior alternatives to AdamW.

Findings

01

Muon and SOAP are 18% more efficient than AdamW.

02

Learning-rate schedules significantly affect training dynamics.

03

There is a notable performance gap between Adam and SGD in this context.

Abstract

The training of diffusion models is often absent in the evaluation of new optimization techniques. In this work, we benchmark recent optimization algorithms for training a diffusion model for denoising flow trajectories. We observe that Muon and SOAP are highly efficient alternatives to AdamW (18% lower final loss). We also revisit several recent phenomena related to the training of models for text or image applications in the context of diffusion model training. This includes the impact of the learning-rate schedule on the training dynamics, and the performance gap between Adam and SGD.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Medical Image Segmentation Techniques · COVID-19 diagnosis using AI