Fast and Accurate Causal Parallel Decoding using Jacobi Forcing

Lanxiang Hu; Siqi Kou; Yichao Fu; Samyam Rajbhandari; Tajana Rosing; Yuxiong He; Zhijie Deng; Hao Zhang

arXiv:2512.14681·cs.CL·December 17, 2025

Fast and Accurate Causal Parallel Decoding using Jacobi Forcing

Lanxiang Hu, Siqi Kou, Yichao Fu, Samyam Rajbhandari, Tajana Rosing, Yuxiong He, Zhijie Deng, Hao Zhang

PDF

Open Access

TL;DR

This paper introduces Jacobi Forcing, a novel training paradigm that transforms autoregressive models into efficient parallel decoders, significantly speeding up large language model inference while maintaining quality.

Contribution

It proposes Jacobi Forcing, a progressive distillation method that aligns parallel decoding trajectories with pretrained causal models, enabling faster inference with minimal performance loss.

Findings

01

Achieves 3.8x speedup on coding and math benchmarks.

02

Introduces multi-block decoding with rejection recycling, up to 4.5x token acceptance and 4.0x speedup.

03

Maintains near-original performance with significant inference acceleration.

Abstract

Multi-token generation has emerged as a promising paradigm for accelerating transformer-based large model inference. Recent efforts primarily explore diffusion Large Language Models (dLLMs) for parallel decoding to reduce inference latency. To achieve AR-level generation quality, many techniques adapt AR models into dLLMs to enable parallel decoding. However, they suffer from limited speedup compared to AR models due to a pretrain-to-posttrain mismatch. Specifically, the masked data distribution in post-training deviates significantly from the real-world data distribution seen during pretraining, and dLLMs rely on bidirectional attention, which conflicts with the causal prior learned during pretraining and hinders the integration of exact KV cache reuse. To address this, we introduce Jacobi Forcing, a progressive distillation paradigm where models are trained on their own generated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Machine Learning in Healthcare · Topic Modeling