Continuous Control Reinforcement Learning: Distributed Distributional   DrQ Algorithms

Zehao Zhou

arXiv:2404.10645·cs.LG·April 17, 2024·2 cites

Continuous Control Reinforcement Learning: Distributed Distributional DrQ Algorithms

Zehao Zhou

PDF

Open Access

TL;DR

This paper introduces Distributed Distributional DrQ, an advanced off-policy reinforcement learning algorithm for continuous control, enhancing performance through distributional value functions and distributed actor policies.

Contribution

It extends DrQ-v2 by integrating distributional critics and distributed actors, achieving improved results in challenging continuous control tasks.

Findings

01

Outperforms previous algorithms on several benchmarks

02

Enhanced expression ability of value functions

03

Better handling of high-dimensional control tasks

Abstract

Distributed Distributional DrQ is a model-free and off-policy RL algorithm for continuous control tasks based on the state and observation of the agent, which is an actor-critic method with the data-augmentation and the distributional perspective of critic value function. Aim to learn to control the agent and master some tasks in a high-dimensional continuous space. DrQ-v2 uses DDPG as the backbone and achieves out-performance in various continuous control tasks. Here Distributed Distributional DrQ uses Distributed Distributional DDPG as the backbone, and this modification aims to achieve better performance in some hard continuous control tasks through the better expression ability of distributional value function and distributed actor policies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExtremum Seeking Control Systems · Iterative Learning Control Systems · Traffic control and management

MethodsAdam · Convolution · Batch Normalization · *Communicated@Fast*How Do I Communicate to Expedia? · Weight Decay · Experience Replay · Dense Connections · Deep Deterministic Policy Gradient