Loading paper
On the Convergence of Gradient Descent on Learning Transformers with Residual Connections | Tomesphere