Loading paper
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models | Tomesphere