Self-Distillation for Multi-Token Prediction

Guoliang Zhao; Ruobing Xie; An Wang; Shuaipeng Li; Huaibing Xie; Xingwu Sun

arXiv:2603.23911·cs.CL·March 26, 2026

Self-Distillation for Multi-Token Prediction

Guoliang Zhao, Ruobing Xie, An Wang, Shuaipeng Li, Huaibing Xie, Xingwu Sun

PDF

Open Access

TL;DR

This paper introduces MTP-D, a self-distillation method that improves multi-token prediction in large language models, significantly boosting inference speed and head acceptance rates with minimal additional training.

Contribution

The paper proposes MTP-D and a looped extension strategy, advancing multi-token prediction techniques for faster, more efficient large language model inference.

Findings

01

MTP-D increases acceptance rates by +7.5%.

02

Looped extension achieves +220.4% speedup with 1-head MTP.

03

Extensive experiments validate the effectiveness across seven benchmarks.

Abstract

As Large Language Models (LLMs) scale up, inference efficiency becomes a critical bottleneck. Multi-Token Prediction (MTP) could accelerate LLM inference by predicting multiple future tokens in parallel. However, existing MTP approaches still face two challenges: limited acceptance rates of MTP heads, and difficulties in jointly training multiple MTP heads. Therefore, we propose MTP-D, a simple yet effective self-distillation method with minimal additional training cost, which boosts MTP head acceptance rates (+7.5\%) while maximumly preserving main-head performance. We also introduce a looped extension strategy for MTP-D, enabling effective and economical MTP head extension and further significant inference speedup to 1-head MTP (+220.4\%). Moreover, we systematically explore and validate key insights on the distillation strategies and the potential scalability of MTP through extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning and Data Classification · Machine Learning in Healthcare