Loading paper
Fast Catch-Up, Late Switching: Optimal Batch Size Scheduling via Functional Scaling Laws | Tomesphere