Loading paper
Sparsity Moves Computation: How FFN Architecture Reshapes Attention in Small Transformers | Tomesphere