Loading paper
How Transformers Learn to Plan via Multi-Token Prediction | Tomesphere