Parallel $Q$-Learning: Scaling Off-policy Reinforcement Learning under   Massively Parallel Simulation

Zechu Li; Tao Chen; Zhang-Wei Hong; Anurag Ajay; Pulkit Agrawal

arXiv:2307.12983·cs.LG·July 25, 2023

Parallel $Q$-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation

Zechu Li, Tao Chen, Zhang-Wei Hong, Anurag Ajay, Pulkit Agrawal

PDF

Open Access 1 Video

TL;DR

This paper introduces Parallel Q-Learning, a scalable off-policy reinforcement learning method that leverages massively parallel GPU simulation to outperform PPO in training speed while maintaining sample efficiency.

Contribution

The paper proposes a novel Parallel Q-Learning scheme optimized for GPU-based simulation, enabling scalable off-policy learning on a single workstation.

Findings

01

Q-learning scaled to tens of thousands of environments

02

Outperforms PPO in wall-clock training time

03

Maintains superior sample efficiency

Abstract

Reinforcement learning is time-consuming for complex tasks due to the need for large amounts of training data. Recent advances in GPU-based simulation, such as Isaac Gym, have sped up data collection thousands of times on a commodity GPU. Most prior works used on-policy methods like PPO due to their simplicity and ease of scaling. Off-policy methods are more data efficient but challenging to scale, resulting in a longer wall-clock training time. This paper presents a Parallel $Q$ -Learning (PQL) scheme that outperforms PPO in wall-clock time while maintaining superior sample efficiency of off-policy learning. PQL achieves this by parallelizing data collection, policy learning, and value learning. Different from prior works on distributed off-policy learning, such as Apex, our scheme is designed specifically for massively parallel GPU-based simulation and optimized to work on a single…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Parallel $Q$-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Advanced Neural Network Applications

MethodsEntropy Regularization · Proximal Policy Optimization