Loading paper
Scaling Diffusion Transformers Efficiently via $\mu$P | Tomesphere