HybridFlow: A Flexible and Efficient RLHF Framework
Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru, Zhang, Yanghua Peng, Haibin Lin, Chuan Wu

TL;DR
HybridFlow introduces a hybrid control framework for RLHF that enhances flexibility and efficiency in large language model training and generation, significantly improving throughput over existing systems.
Contribution
It proposes a novel hybrid control paradigm combining single- and multi-controller approaches for RLHF dataflow execution, with hierarchical APIs and a 3D-HybridEngine for optimized resharding.
Findings
Achieves up to 20.57× throughput improvement over baselines
Enables flexible mapping of RLHF computation onto various devices
Reduces communication overhead and memory redundancy
Abstract
Reinforcement Learning from Human Feedback (RLHF) is widely used in Large Language Model (LLM) alignment. Traditional RL can be modeled as a dataflow, where each node represents computation of a neural network (NN) and each edge denotes data dependencies between the NNs. RLHF complicates the dataflow by expanding each node into a distributed LLM training or generation program, and each edge into a many-to-many multicast. Traditional RL frameworks execute the dataflow using a single controller to instruct both intra-node computation and inter-node communication, which can be inefficient in RLHF due to large control dispatch overhead for distributed intra-node computation. Existing RLHF systems adopt a multi-controller paradigm, which can be inflexible due to nesting distributed computation and data communication. We propose HybridFlow, which combines single-controller and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗speakleash/Bielik-Minitron-7B-v3.0-Instructmodel· 3.7k dl· ♡ 173.7k dl♡ 17
- 🤗zswzswzsw/GAO_grpomodel
- 🤗zswzswzsw/verl_subquestionmodel
- 🤗speakleash/Bielik-11B-v3.0-Instructmodel· 369k dl· ♡ 56369k dl♡ 56
- 🤗hour1/collabllmmodel
- 🤗atad-tokyo/GST_VERLmodel
- 🤗safestack/Bielik-11B-v3.0-Instructmodel· 22 dl22 dl
- 🤗gepardzik/Bielik-11B-v3.0-Instruct-heretic-MPOAmodel· 1 dl1 dl
- 🤗websystemspl/Bielik-11B-v3.0-Instruct-128kmodel· 2 dl2 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSparse Evolutionary Training
