Discretizing Continuous Action Space with Unimodal Probability   Distributions for On-Policy Reinforcement Learning

Yuanyang Zhu; Zhi Wang; Yuanheng Zhu; Chunlin Chen; Dongbin Zhao

arXiv:2408.00309·cs.LG·August 2, 2024

Discretizing Continuous Action Space with Unimodal Probability Distributions for On-Policy Reinforcement Learning

Yuanyang Zhu, Zhi Wang, Yuanheng Zhu, Chunlin Chen, Dongbin Zhao

PDF

Open Access 1 Repo

TL;DR

This paper introduces a unimodal discrete policy architecture using Poisson distributions for on-policy reinforcement learning, improving convergence speed and stability in complex control tasks by better leveraging the continuity of the action space.

Contribution

The paper proposes a novel unimodal discrete policy architecture with Poisson distributions that reduces variance and enhances learning stability in continuous control tasks.

Findings

01

Faster convergence in complex control tasks

02

Higher performance in challenging environments

03

Lower variance in policy gradient estimates

Abstract

For on-policy reinforcement learning, discretizing action space for continuous control can easily express multiple modes and is straightforward to optimize. However, without considering the inherent ordering between the discrete atomic actions, the explosion in the number of discrete actions can possess undesired properties and induce a higher variance for the policy gradient estimator. In this paper, we introduce a straightforward architecture that addresses this issue by constraining the discrete policy to be unimodal using Poisson probability distributions. This unimodal architecture can better leverage the continuity in the underlying continuous action space using explicit unimodal probability distributions. We conduct extensive experiments to show that the discrete policy with the unimodal probability distribution provides significantly faster convergence and higher performance for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhuyuanyang/udprl
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics