The Cosine Schedule is Fisher-Rao-Optimal for Masked Discrete Diffusion Models
Leo Zhang, Saifuddin Syed

TL;DR
This paper demonstrates that the cosine schedule used in masked discrete diffusion models is optimal according to Fisher-Rao information geometry, providing a theoretical justification for its effectiveness.
Contribution
The work establishes the Fisher-Rao geometry as the basis for optimal discretisation schedules, revealing the cosine schedule as Fisher-Rao-optimal for masked discrete diffusion models.
Findings
Cosine schedule is Fisher-Rao-optimal.
Optimal schedule aligns with popular cosine schedule.
Theoretical link between geometry and schedule choice.
Abstract
In this work, we study the problem of choosing the discretisation schedule for sampling from masked discrete diffusion models in terms of the information geometry of the induced probability path. Specifically, we show that the optimal schedule under the Fisher-Rao geometry recovers the popularly-used cosine schedule.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
