An Adaptive Placement and Parallelism Framework for Accelerating RLHF   Training

Youshao Xiao; Zhenglei Zhou; Fagui Mao; Weichang Wu; Shangchun Zhao,; Lin Ju; Lei Liang; Xiaolu Zhang; Jun Zhou

arXiv:2312.11819·cs.LG·October 15, 2024·1 cites

An Adaptive Placement and Parallelism Framework for Accelerating RLHF Training

Youshao Xiao, Zhenglei Zhou, Fagui Mao, Weichang Wu, Shangchun Zhao,, Lin Ju, Lei Liang, Xiaolu Zhang, Jun Zhou

PDF

Open Access

TL;DR

This paper introduces an adaptive framework with flexible placement strategies to improve the efficiency of RLHF training for large language models, significantly reducing bottlenecks and boosting throughput.

Contribution

It proposes two novel model placement strategies, Interleaving and Disaggregated, to optimize resource utilization and training speed in RLHF workflows.

Findings

01

Achieves up to 11x speedup over SOTA methods

02

Reduces memory redundancy and communication costs

03

Enhances training throughput and efficiency

Abstract

Recently, ChatGPT or InstructGPT like large language models (LLM) has made a significant impact in the AI world. Many works have attempted to reproduce the complex InstructGPT's training pipeline, namely Reinforcement Learning with Human Feedback (RLHF). However, the mainstream distributed RLHF training methods typically adopt a fixed model placement strategy, referred to as the Co-located strategy. This strategy treats all four interdependent models involved in RLHF as a single entity, distributing them across all devices and applying parallelism techniques designed for a single model, regardless of the workload heterogeneity inherent to each model. As a result, this strategy exacerbates the generation bottlenecks in the RLHF training and degrades the overall training efficiency. To address these issues, we propose a flexible model placement framework that offers two general and agile…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTeleoperation and Haptic Systems · Intelligent Tutoring Systems and Adaptive Learning