Loading paper
Sparse Growing Transformer: Training-Time Sparse Depth Allocation via Progressive Attention Looping | Tomesphere