PACER: A Fully Push-forward-based Distributional Reinforcement Learning   Algorithm

Wensong Bai; Chao Zhang; Yichao Fu; Peilin Zhao; Hui Qian; Bin Dai

arXiv:2306.06637·cs.LG·October 10, 2024·1 cites

PACER: A Fully Push-forward-based Distributional Reinforcement Learning Algorithm

Wensong Bai, Chao Zhang, Yichao Fu, Peilin Zhao, Hui Qian, Bin Dai

PDF

Open Access

TL;DR

PACER is a novel distributional reinforcement learning algorithm that fully utilizes push-forward operators for both the critic and actor, enhancing exploration and policy modeling capabilities, and demonstrating superior performance empirically.

Contribution

It introduces the first fully push-forward-based RL algorithm, with novel sample-based regularizers and a stochastic utility value policy gradient, expanding policy space exploration.

Findings

01

PACER outperforms state-of-the-art algorithms in experiments.

02

The push-forward operator enhances distribution modeling.

03

Sample-based regularizers improve exploration efficiency.

Abstract

In this paper, we propose the first fully push-forward-based distributional reinforcement learning algorithm, named PACER, which consists of a distributional critic, a stochastic actor and a sample-based encourager. Specifically, the push-forward operator is leveraged in both the critic and actor to model the return distributions and stochastic policies respectively, enabling them with equal modeling capability and thus enhancing the synergetic performance. Since it is infeasible to obtain the density function of the push-forward policies, novel sample-based regularizers are integrated in the encourager to incentivize efficient exploration and alleviate the risk of trapping into local optima. Moreover, a sample-based stochastic utility value policy gradient is established for the push-forward policy update, which circumvents the explicit demand of the policy density function in existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics