Loading paper
Linear attention is (maybe) all you need (to understand transformer optimization) | Tomesphere