Fast LLM Post-training via Decoupled and Fastest-of-N Speculation

Rongxin Cheng; Kai Zhou; Xingda Wei; Siyuan Liu; Mingcong Han; Mingjing Ai; Yeju Zhou; Baoquan Zhong; Wencong Xiao; Rong Chen; Haibo Chen

arXiv:2511.16193·cs.DC·December 24, 2025

Fast LLM Post-training via Decoupled and Fastest-of-N Speculation

Rongxin Cheng, Kai Zhou, Xingda Wei, Siyuan Liu, Mingcong Han, Mingjing Ai, Yeju Zhou, Baoquan Zhong, Wencong Xiao, Rong Chen, Haibo Chen

PDF

Open Access

TL;DR

This paper introduces SpecActor, a speculative decoding method that significantly accelerates large language model post-training rollout by decoupling speculation and dynamically selecting draft methods, achieving over 2x speedup.

Contribution

SpecActor presents a novel decoupled speculation approach and a fastest-of-N method to improve rollout efficiency in LLM post-training, ensuring correctness and adaptability.

Findings

01

Achieves 2.0--2.4x speedup in rollout speed

02

Up to 2.7x acceleration over baselines

03

Reduces end-to-end training time by 1.4--2.3x

Abstract

Rollout dominates the training time in large language model (LLM) post-training, where the trained model is used to generate tokens given a batch of prompts. This work, SpecActor, achieves fast rollout with speculative decoding that deploys a fast draft path to accelerate the unparallelizable generation, while the correctness is guaranteed by fast parallel verification of the outputs with the original model. SpecActor addresses two foundational challenges that hinder speculation efficiency: (1) a Decoupled speculation method that overcomes the computation inefficiency issue when executing speculative decoding with relative large per-worker batch size -- a common configuration in training but unfriendly to speculation, and (2) a Fastest-of-N speculation method that selects and combines different draft methods according to the rollout progress to approximate the optimal draft method even…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Generative Adversarial Networks and Image Synthesis