Optimizing RLHF Training for Large Language Models with Stage Fusion
Yinmin Zhong, Zili Zhang, Bingyang Wu, Shengyu Liu, Yukun Chen,, Changyi Wan, Hanpeng Hu, Lei Xia, Ranchen Ming, Yibo Zhu, Xin Jin

TL;DR
RLHFuse introduces stage fusion techniques to optimize GPU utilization and significantly boost training throughput in RLHF for large language models by overlapping tasks and reducing pipeline inefficiencies.
Contribution
The paper proposes a novel stage fusion approach for RLHF training that enhances GPU efficiency by splitting tasks into finer subtasks and overlapping execution stages.
Findings
Training throughput increased by up to 3.7 times.
Effective mitigation of generation bottlenecks and pipeline bubbles.
Improved GPU utilization in RLHF workflows.
Abstract
We present RLHFuse, an efficient training system with stage fusion for Reinforcement Learning from Human Feedback (RLHF). Due to the intrinsic nature of RLHF training, i.e., the data skewness in the generation stage and the pipeline bubbles in the training stage, existing RLHF systems suffer from low GPU utilization. RLHFuse breaks the traditional view of RLHF workflow as a composition of individual tasks, splitting each task into finer-grained subtasks, and performing stage fusion to improve GPU utilization. RLHFuse contains two key ideas. First, for generation and inference tasks, RLHFuse splits them into sample-level subtasks, enabling efficient inter-stage fusion to overlap the execution of generation and inference stages, thus mitigating the original generation bottleneck dominated by long-tailed samples. Second, for training tasks, RLHFuse breaks them into subtasks of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques
