Loading paper
Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps | Tomesphere