Loading paper
Half the Nonlinearity Is Wasted: Measuring and Reallocating the Transformer's MLP Budget | Tomesphere