Your LLM Knows the Future: Uncovering Its Multi-Token Prediction Potential

Mohammad Samragh; Arnav Kundu; David Harrison; Kumari Nishu; Devang Naik; Minsik Cho; Mehrdad Farajtabar

arXiv:2507.11851·cs.CL·July 17, 2025

Your LLM Knows the Future: Uncovering Its Multi-Token Prediction Potential

Mohammad Samragh, Arnav Kundu, David Harrison, Kumari Nishu, Devang Naik, Minsik Cho, Mehrdad Farajtabar

PDF

Open Access

TL;DR

This paper introduces a novel framework enabling autoregressive language models to predict multiple tokens simultaneously, significantly increasing inference speed without sacrificing output quality.

Contribution

It presents a new multi-token prediction method using masked-input formulation, gated LoRA, and auxiliary losses, enhancing speed and coherence in autoregressive models.

Findings

01

Achieves nearly 5x faster code and math generation

02

Improves chat and knowledge task speed by 2.5x

03

Maintains high output quality with speedup

Abstract

Autoregressive language models are constrained by their inherently sequential nature, generating one token at a time. This paradigm limits inference speed and parallelism, especially during later stages of generation when the direction and semantics of text are relatively certain. In this work, we propose a novel framework that leverages the inherent knowledge of vanilla autoregressive language models about future tokens, combining techniques to realize this potential and enable simultaneous prediction of multiple subsequent tokens. Our approach introduces several key innovations: (1) a masked-input formulation where multiple future tokens are jointly predicted from a common prefix; (2) a gated LoRA formulation that preserves the original LLM's functionality, while equipping it for multi-token prediction; (3) a lightweight, learnable sampler module that generates coherent sequences from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · Law, AI, and Intellectual Property · Legal Education and Practice Innovations