RLHFSpec: Breaking the Efficiency Bottleneck in RLHF Training via Adaptive Drafting
Siqi Wang, Hailong Yang, Junjie Zhu, Xuezhu Wang, Yufan Xu, Depei Qian

TL;DR
RLHFSpec introduces an adaptive speculative decoding approach to significantly accelerate the generation stage in RLHF training of large language models, reducing overall training time.
Contribution
It is the first to integrate speculative decoding into RLHF generation, proposing workload-aware strategy selection and sample reallocation for improved efficiency.
Findings
Achieves higher throughput in the generation stage.
Significantly speeds up overall RLHF training.
Outperforms state-of-the-art methods in efficiency.
Abstract
Reinforcement Learning from Human Feedback (RLHF) is an important fine-tuning technique for large language models (LLMs) and comprises three stages: generation, inference, and training. The generation stage generates samples that are then used to infer learnable experiences for training. We observe that the generation stage is the bottleneck of the entire execution process and consider it a key point for optimization. Specifically, we realize the first attempt to integrate speculative decoding into the RLHF generation stage and propose RLHFSpec, an RLHF system that accelerates generation execution with efficient speculative decoding and sample reallocation. To fully exploit the performance potential provided by speculative decoding, especially dealing with the dynamic workload of the generation stage, RLHFSpec proposes a workload-aware drafting strategy selection mechanism, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications
