Loading paper
What Happens During the Loss Plateau? Understanding Abrupt Learning in Transformers | Tomesphere