Pre-Training Curriculum for Multi-Token Prediction in Language Models

Ansar Aynetdinov; Alan Akbik

arXiv:2505.22757·cs.CL·May 30, 2025

Pre-Training Curriculum for Multi-Token Prediction in Language Models

Ansar Aynetdinov, Alan Akbik

PDF

Open Access 1 Repo

TL;DR

This paper introduces a curriculum learning approach for multi-token prediction in language models, enhancing training efficiency and downstream performance, especially for smaller models, by gradually increasing task complexity.

Contribution

It proposes forward and reverse curriculum strategies for MTP training, improving small model performance and decoding efficiency compared to prior methods.

Findings

01

Forward curriculum improves downstream NTP performance and output quality.

02

Reverse curriculum yields stronger NTP performance but lacks decoding benefits.

03

Curriculum strategies enable better leverage of MTP in small language models.

Abstract

Multi-token prediction (MTP) is a recently proposed pre-training objective for language models. Rather than predicting only the next token (NTP), MTP predicts the next $k$ tokens at each prediction step, using multiple prediction heads. MTP has shown promise in improving downstream performance, inference speed, and training efficiency, particularly for large models. However, prior work has shown that smaller language models (SLMs) struggle with the MTP objective. To address this, we propose a curriculum learning strategy for MTP training, exploring two variants: a forward curriculum, which gradually increases the complexity of the pre-training objective from NTP to MTP, and a reverse curriculum, which does the opposite. Our experiments show that the forward curriculum enables SLMs to better leverage the MTP objective during pre-training, improving downstream NTP performance and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aynetdia/mtp_curriculum
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning in Healthcare