Loading paper
Transformers Trained via Gradient Descent Can Provably Learn a Class of Teacher Models | Tomesphere