PACER: A Fully Push-forward-based Distributional Reinforcement Learning Algorithm
Wensong Bai, Chao Zhang, Yichao Fu, Peilin Zhao, Hui Qian, Bin Dai

TL;DR
PACER is a novel distributional reinforcement learning algorithm that fully utilizes push-forward operators for both the critic and actor, enhancing exploration and policy modeling capabilities, and demonstrating superior performance empirically.
Contribution
It introduces the first fully push-forward-based RL algorithm, with novel sample-based regularizers and a stochastic utility value policy gradient, expanding policy space exploration.
Findings
PACER outperforms state-of-the-art algorithms in experiments.
The push-forward operator enhances distribution modeling.
Sample-based regularizers improve exploration efficiency.
Abstract
In this paper, we propose the first fully push-forward-based distributional reinforcement learning algorithm, named PACER, which consists of a distributional critic, a stochastic actor and a sample-based encourager. Specifically, the push-forward operator is leveraged in both the critic and actor to model the return distributions and stochastic policies respectively, enabling them with equal modeling capability and thus enhancing the synergetic performance. Since it is infeasible to obtain the density function of the push-forward policies, novel sample-based regularizers are integrated in the encourager to incentivize efficient exploration and alleviate the risk of trapping into local optima. Moreover, a sample-based stochastic utility value policy gradient is established for the push-forward policy update, which circumvents the explicit demand of the policy density function in existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
