Loading paper
Deriving Hyperparameter Scaling Laws via Modern Optimization Theory | Tomesphere