Loading paper
Weight decay induces low-rank attention layers | Tomesphere