Loading paper
How Does Critical Batch Size Scale in Pre-training? | Tomesphere