Loading paper
Taming Transformer Without Using Learning Rate Warmup | Tomesphere