Parallel Token Prediction for Language Models

Felix Draxler; Justus Will; Farrin Marouf Sofian; Theofanis Karaletsos; Sameer Singh; Stephan Mandt

arXiv:2512.21323·cs.CL·March 6, 2026

Parallel Token Prediction for Language Models

Felix Draxler, Justus Will, Farrin Marouf Sofian, Theofanis Karaletsos, Sameer Singh, Stephan Mandt

PDF

Open Access

TL;DR

Parallel Token Prediction (PTP) significantly accelerates autoregressive language models by enabling the prediction of multiple tokens simultaneously, maintaining dependency modeling while reducing inference time.

Contribution

The paper introduces PTP, a novel framework that predicts multiple tokens in one forward pass by transforming randomness sources, with proven arbitrary dependency modeling and efficient training methods.

Findings

01

Achieves 2.4x speedup on a diverse-task benchmark.

02

Can represent arbitrary token dependencies in a single call.

03

Provides open-source code and checkpoints.

Abstract

Autoregressive decoding in language models is inherently slow, generating only one token per forward pass. We propose Parallel Token Prediction (PTP), a general-purpose framework for predicting multiple tokens in a single model call. PTP moves the source of randomness from post-hoc sampling to random input variables, making future tokens deterministic functions of those inputs and thus jointly predictable in a single forward pass. We prove that a single PTP call can represent arbitrary dependencies between tokens. PTP is trained by distilling an existing model or through inverse autoregressive training without a teacher. Experimentally, PTP achieves a 2.4x speedup on a diverse-task speculative decoding benchmark. We provide code and checkpoints at https://github.com/mandt-lab/ptp.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis