Pre-Training Curriculum for Multi-Token Prediction in Language Models
Ansar Aynetdinov, Alan Akbik

TL;DR
This paper introduces a curriculum learning approach for multi-token prediction in language models, enhancing training efficiency and downstream performance, especially for smaller models, by gradually increasing task complexity.
Contribution
It proposes forward and reverse curriculum strategies for MTP training, improving small model performance and decoding efficiency compared to prior methods.
Findings
Forward curriculum improves downstream NTP performance and output quality.
Reverse curriculum yields stronger NTP performance but lacks decoding benefits.
Curriculum strategies enable better leverage of MTP in small language models.
Abstract
Multi-token prediction (MTP) is a recently proposed pre-training objective for language models. Rather than predicting only the next token (NTP), MTP predicts the next tokens at each prediction step, using multiple prediction heads. MTP has shown promise in improving downstream performance, inference speed, and training efficiency, particularly for large models. However, prior work has shown that smaller language models (SLMs) struggle with the MTP objective. To address this, we propose a curriculum learning strategy for MTP training, exploring two variants: a forward curriculum, which gradually increases the complexity of the pre-training objective from NTP to MTP, and a reverse curriculum, which does the opposite. Our experiments show that the forward curriculum enables SLMs to better leverage the MTP objective during pre-training, improving downstream NTP performance and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning in Healthcare
