Fast LLM Post-training via Decoupled and Fastest-of-N Speculation
Rongxin Cheng, Kai Zhou, Xingda Wei, Siyuan Liu, Mingcong Han, Mingjing Ai, Yeju Zhou, Baoquan Zhong, Wencong Xiao, Rong Chen, Haibo Chen

TL;DR
This paper introduces SpecActor, a speculative decoding method that significantly accelerates large language model post-training rollout by decoupling speculation and dynamically selecting draft methods, achieving over 2x speedup.
Contribution
SpecActor presents a novel decoupled speculation approach and a fastest-of-N method to improve rollout efficiency in LLM post-training, ensuring correctness and adaptability.
Findings
Achieves 2.0--2.4x speedup in rollout speed
Up to 2.7x acceleration over baselines
Reduces end-to-end training time by 1.4--2.3x
Abstract
Rollout dominates the training time in large language model (LLM) post-training, where the trained model is used to generate tokens given a batch of prompts. This work, SpecActor, achieves fast rollout with speculative decoding that deploys a fast draft path to accelerate the unparallelizable generation, while the correctness is guaranteed by fast parallel verification of the outputs with the original model. SpecActor addresses two foundational challenges that hinder speculation efficiency: (1) a Decoupled speculation method that overcomes the computation inefficiency issue when executing speculative decoding with relative large per-worker batch size -- a common configuration in training but unfriendly to speculation, and (2) a Fastest-of-N speculation method that selects and combines different draft methods according to the rollout progress to approximate the optimal draft method even…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Generative Adversarial Networks and Image Synthesis
