Loading paper
Weight Decay may matter more than muP for Learning Rate Transfer in Practice | Tomesphere