Loading paper
Optimizing Attention with Mirror Descent: Generalized Max-Margin Token Selection | Tomesphere