Loading paper
Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot | Tomesphere