Loading paper
One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention | Tomesphere