RLHFSpec: Breaking the Efficiency Bottleneck in RLHF Training via Adaptive Drafting

Siqi Wang; Hailong Yang; Junjie Zhu; Xuezhu Wang; Yufan Xu; Depei Qian

arXiv:2512.04752·cs.LG·December 15, 2025

RLHFSpec: Breaking the Efficiency Bottleneck in RLHF Training via Adaptive Drafting

Siqi Wang, Hailong Yang, Junjie Zhu, Xuezhu Wang, Yufan Xu, Depei Qian

PDF

Open Access

TL;DR

RLHFSpec introduces an adaptive speculative decoding approach to significantly accelerate the generation stage in RLHF training of large language models, reducing overall training time.

Contribution

It is the first to integrate speculative decoding into RLHF generation, proposing workload-aware strategy selection and sample reallocation for improved efficiency.

Findings

01

Achieves higher throughput in the generation stage.

02

Significantly speeds up overall RLHF training.

03

Outperforms state-of-the-art methods in efficiency.

Abstract

Reinforcement Learning from Human Feedback (RLHF) is an important fine-tuning technique for large language models (LLMs) and comprises three stages: generation, inference, and training. The generation stage generates samples that are then used to infer learnable experiences for training. We observe that the generation stage is the bottleneck of the entire execution process and consider it a key point for optimization. Specifically, we realize the first attempt to integrate speculative decoding into the RLHF generation stage and propose RLHFSpec, an RLHF system that accelerates generation execution with efficient speculative decoding and sample reallocation. To fully exploit the performance potential provided by speculative decoding, especially dealing with the dynamic workload of the generation stage, RLHFSpec proposes a workload-aware drafting strategy selection mechanism, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications