Highly Parallelized Reinforcement Learning Training with Relaxed   Assignment Dependencies

Zhouyu He; Peng Qiao; Rongchun Li; Yong Dou; Yusong Tan

arXiv:2502.20190·cs.LG·February 28, 2025

Highly Parallelized Reinforcement Learning Training with Relaxed Assignment Dependencies

Zhouyu He, Peng Qiao, Rongchun Li, Yong Dou, Yusong Tan

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces TianJi, a high-throughput distributed reinforcement learning system that relaxes assignment dependencies, enabling asynchronous communication and achieving significant speedups in training convergence and data transmission efficiency.

Contribution

TianJi is the first system to relax assignment dependencies in DRL training, improving parallelization, scalability, and convergence speed while maintaining convergence guarantees.

Findings

01

Achieves up to 4.37x faster convergence time.

02

Scales to 8 nodes with 1.6x convergence and 7.13x throughput speedup.

03

Outperforms existing systems in data transmission efficiency.

Abstract

As the demands for superior agents grow, the training complexity of Deep Reinforcement Learning (DRL) becomes higher. Thus, accelerating training of DRL has become a major research focus. Dividing the DRL training process into subtasks and using parallel computation can effectively reduce training costs. However, current DRL training systems lack sufficient parallelization due to data assignment between subtask components. This assignment issue has been ignored, but addressing it can further boost training efficiency. Therefore, we propose a high-throughput distributed RL training system called TianJi. It relaxes assignment dependencies between subtask components and enables event-driven asynchronous communication. Meanwhile, TianJi maintains clear boundaries between subtask components. To address convergence uncertainty from relaxed assignment dependencies, TianJi proposes a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hiprl/tianji
pytorchOfficial

Videos

Highly Parallelized Reinforcement Learning Training with Relaxed Assignment Dependencies· underline

Taxonomy

TopicsReinforcement Learning in Robotics · Software-Defined Networks and 5G · Domain Adaptation and Few-Shot Learning