Loading paper
Hyperparameter Transfer Enables Consistent Gains of Matrix-Preconditioned Optimizers Across Scales | Tomesphere