Loading paper
RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale | Tomesphere