Loading paper
Exploring Scaling Laws for Local SGD in Large Language Model Training | Tomesphere