Loading paper
Large-Step Training Dynamics of a Two-Factor Linear Transformer Model | Tomesphere