Normality-Guided Distributional Reinforcement Learning for Continuous Control

Ju-Seung Byun; Andrew Perrault

arXiv:2208.13125·cs.LG·July 8, 2025

Normality-Guided Distributional Reinforcement Learning for Continuous Control

Ju-Seung Byun, Andrew Perrault

PDF

Open Access

TL;DR

This paper introduces a normality-guided distributional reinforcement learning method for continuous control, leveraging the empirical normality of value distributions to improve performance and training efficiency.

Contribution

It proposes a novel approach that exploits the near-normal distribution of value functions, using variance networks and a new policy update strategy, compatible with existing DRL algorithms.

Findings

01

Significant performance improvements in 10 out of 16 tasks.

02

Faster training times with fewer parameters compared to ensemble methods.

03

Effective use of normal distribution assumptions in continuous control environments.

Abstract

Learning a predictive model of the mean return, or value function, plays a critical role in many reinforcement learning algorithms. Distributional reinforcement learning (DRL) has been shown to improve performance by modeling the value distribution, not just the mean. We study the value distribution in several continuous control tasks and find that the learned value distribution is empirically quite close to normal. We design a method that exploits this property, employing variances predicted from a variance network, along with returns, to analytically compute target quantile bars representing a normal for our distributional value function. In addition, we propose a policy update strategy based on the correctness as measured by structural characteristics of the value distribution not present in the standard value function. The approach we outline is compatible with many DRL structures.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Insect behavior and control techniques

MethodsEntropy Regularization · Proximal Policy Optimization · Trust Region Policy Optimization