Loading paper
Gated Normalization Removal and Scale Anchoring in Pre-Norm Transformers | Tomesphere