Loading paper
Attention-Only Transformers and Implementing MLPs with Attention Heads | Tomesphere