Loading paper
Scaling Optimal LR Across Token Horizons | Tomesphere