HetRL: Efficient Reinforcement Learning for LLMs in Heterogeneous Environments
Yongjun He, Shuai Zhang, Jiading Gai, Xiyuan Zhang, Boran Han, Bernie Wang, Huzefa Rangwala, George Karypis

TL;DR
HetRL is a distributed system that optimizes reinforcement learning training for large language models across heterogeneous GPU environments, significantly improving throughput.
Contribution
It introduces a novel scheduling framework with hybrid and ILP-based algorithms to efficiently manage complex RL training workflows on diverse hardware.
Findings
HetRL achieves up to 9.17x throughput improvement over existing systems.
The system demonstrates robust performance across various workloads and settings.
Extensive evaluation used 20,000 GPU-hours to validate efficiency.
Abstract
As large language models (LLMs) continue to scale and new GPUs are released even more frequently, there is an increasing demand for LLM post-training in heterogeneous environments to fully leverage underutilized mid-range or previous-generation GPUs and alleviate the shortage of homogeneous high-end GPUs within a single availability zone. However, achieving high-performance reinforcement learning (RL) training for LLMs on such computing resources remains challenging because the workflow involves multiple models and tasks with complex computation and data dependencies. In this paper, we present HetRL, a distributed system for efficient RL training in infrastructures with heterogeneous GPUs and networks. HetRL formulates the scheduling of RL training in heterogeneous environments as a constrained joint optimization problem and provides two complementary approaches for addressing this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
