Learning Policies through Quantile Regression
Oliver Richter, Roger Wattenhofer

TL;DR
This paper introduces a novel reinforcement learning approach that uses advantage weighted quantile regression to implicitly model policies, enabling the approximation of complex, non-Gaussian distributions for improved performance.
Contribution
It proposes a new policy optimization method that models policies implicitly via quantile regression, overcoming limitations of parametric distributions in policy gradient algorithms.
Findings
Achieves comparable or superior results to state-of-the-art methods on MuJoCo benchmarks.
Allows modeling of complex, non-Gaussian policies in continuous action spaces.
Demonstrates flexibility and improved performance over traditional Gaussian policy parameterizations.
Abstract
Policy gradient based reinforcement learning algorithms coupled with neural networks have shown success in learning complex policies in the model free continuous action space control setting. However, explicitly parameterized policies are limited by the scope of the chosen parametric probability distribution. We show that alternatively to the likelihood based policy gradient, a related objective can be optimized through advantage weighted quantile regression. Our approach models the policy implicitly in the network, which gives the agent the freedom to approximate any distribution in each action dimension, not limiting its capabilities to the commonly used unimodal Gaussian parameterization. This broader spectrum of policies makes our algorithm suitable for problems where Gaussian policies cannot fit the optimal policy. Moreover, our results on the MuJoCo physics simulator benchmarks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Gaussian Processes and Bayesian Inference · Model Reduction and Neural Networks
