Normality-Guided Distributional Reinforcement Learning for Continuous Control
Ju-Seung Byun, Andrew Perrault

TL;DR
This paper introduces a normality-guided distributional reinforcement learning method for continuous control, leveraging the empirical normality of value distributions to improve performance and training efficiency.
Contribution
It proposes a novel approach that exploits the near-normal distribution of value functions, using variance networks and a new policy update strategy, compatible with existing DRL algorithms.
Findings
Significant performance improvements in 10 out of 16 tasks.
Faster training times with fewer parameters compared to ensemble methods.
Effective use of normal distribution assumptions in continuous control environments.
Abstract
Learning a predictive model of the mean return, or value function, plays a critical role in many reinforcement learning algorithms. Distributional reinforcement learning (DRL) has been shown to improve performance by modeling the value distribution, not just the mean. We study the value distribution in several continuous control tasks and find that the learned value distribution is empirically quite close to normal. We design a method that exploits this property, employing variances predicted from a variance network, along with returns, to analytically compute target quantile bars representing a normal for our distributional value function. In addition, we propose a policy update strategy based on the correctness as measured by structural characteristics of the value distribution not present in the standard value function. The approach we outline is compatible with many DRL structures.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Insect behavior and control techniques
MethodsEntropy Regularization · Proximal Policy Optimization · Trust Region Policy Optimization
