PG-Rainbow: Using Distributional Reinforcement Learning in Policy   Gradient Methods

WooJae Jeon; KangJun Lee; Jeewoo Lee

arXiv:2407.13146·cs.LG·July 22, 2024

PG-Rainbow: Using Distributional Reinforcement Learning in Policy Gradient Methods

WooJae Jeon, KangJun Lee, Jeewoo Lee

PDF

Open Access

TL;DR

PG-Rainbow integrates distributional reinforcement learning with policy gradient methods, using quantile information to improve decision-making, demonstrated through Atari game benchmarks.

Contribution

It introduces a novel algorithm combining distributional RL with policy gradients using Implicit Quantile Networks, enhancing policy evaluation.

Findings

01

Improved performance on Atari-2600 benchmarks.

02

Enhanced decision-making capabilities in policy agents.

03

Effective incorporation of reward distribution information.

Abstract

This paper introduces PG-Rainbow, a novel algorithm that incorporates a distributional reinforcement learning framework with a policy gradient algorithm. Existing policy gradient methods are sample inefficient and rely on the mean of returns when calculating the state-action value function, neglecting the distributional nature of returns in reinforcement learning tasks. To address this issue, we use an Implicit Quantile Network that provides the quantile information of the distribution of rewards to the critic network of the Proximal Policy Optimization algorithm. We show empirical results that through the integration of reward distribution information into the policy network, the policy agent acquires enhanced capabilities to comprehensively evaluate the consequences of potential actions in a given state, facilitating more sophisticated and informed decision-making processes. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics