HybridFlow: A Flexible and Efficient RLHF Framework

Guangming Sheng; Chi Zhang; Zilingfeng Ye; Xibin Wu; Wang Zhang; Ru; Zhang; Yanghua Peng; Haibin Lin; Chuan Wu

arXiv:2409.19256·cs.LG·October 3, 2024

HybridFlow: A Flexible and Efficient RLHF Framework

Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru, Zhang, Yanghua Peng, Haibin Lin, Chuan Wu

PDF

5 Repos 9 Models 5 Datasets

TL;DR

HybridFlow introduces a hybrid control framework for RLHF that enhances flexibility and efficiency in large language model training and generation, significantly improving throughput over existing systems.

Contribution

It proposes a novel hybrid control paradigm combining single- and multi-controller approaches for RLHF dataflow execution, with hierarchical APIs and a 3D-HybridEngine for optimized resharding.

Findings

01

Achieves up to 20.57× throughput improvement over baselines

02

Enables flexible mapping of RLHF computation onto various devices

03

Reduces communication overhead and memory redundancy

Abstract

Reinforcement Learning from Human Feedback (RLHF) is widely used in Large Language Model (LLM) alignment. Traditional RL can be modeled as a dataflow, where each node represents computation of a neural network (NN) and each edge denotes data dependencies between the NNs. RLHF complicates the dataflow by expanding each node into a distributed LLM training or generation program, and each edge into a many-to-many multicast. Traditional RL frameworks execute the dataflow using a single controller to instruct both intra-node computation and inter-node communication, which can be inefficient in RLHF due to large control dispatch overhead for distributed intra-node computation. Existing RLHF systems adopt a multi-controller paradigm, which can be inflexible due to nesting distributed computation and data communication. We propose HybridFlow, which combines single-controller and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training