An Adaptive Placement and Parallelism Framework for Accelerating RLHF Training
Youshao Xiao, Zhenglei Zhou, Fagui Mao, Weichang Wu, Shangchun Zhao,, Lin Ju, Lei Liang, Xiaolu Zhang, Jun Zhou

TL;DR
This paper introduces an adaptive framework with flexible placement strategies to improve the efficiency of RLHF training for large language models, significantly reducing bottlenecks and boosting throughput.
Contribution
It proposes two novel model placement strategies, Interleaving and Disaggregated, to optimize resource utilization and training speed in RLHF workflows.
Findings
Achieves up to 11x speedup over SOTA methods
Reduces memory redundancy and communication costs
Enhances training throughput and efficiency
Abstract
Recently, ChatGPT or InstructGPT like large language models (LLM) has made a significant impact in the AI world. Many works have attempted to reproduce the complex InstructGPT's training pipeline, namely Reinforcement Learning with Human Feedback (RLHF). However, the mainstream distributed RLHF training methods typically adopt a fixed model placement strategy, referred to as the Co-located strategy. This strategy treats all four interdependent models involved in RLHF as a single entity, distributing them across all devices and applying parallelism techniques designed for a single model, regardless of the workload heterogeneity inherent to each model. As a result, this strategy exacerbates the generation bottlenecks in the RLHF training and degrades the overall training efficiency. To address these issues, we propose a flexible model placement framework that offers two general and agile…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTeleoperation and Haptic Systems · Intelligent Tutoring Systems and Adaptive Learning
