Loading paper
Multi-Token Prediction via Self-Distillation | Tomesphere