Generalized Advantage Estimation for Distributional Policy Gradients

Shahil Shaik; Jonathon M. Smereka; and Yue Wang

arXiv:2507.17530·cs.LG·July 24, 2025

Generalized Advantage Estimation for Distributional Policy Gradients

Shahil Shaik, Jonathon M. Smereka, and Yue Wang

PDF

Open Access

TL;DR

This paper introduces Distributional GAE (DGAE), a new advantage estimation method for distributional RL that uses optimal transport theory to better handle stochasticity and improve policy gradient estimates.

Contribution

It proposes DGAE, integrating Wasserstein-like metrics into advantage estimation for distributional RL, enhancing robustness and low variance in policy updates.

Findings

01

DGAE outperforms traditional GAE in various environments.

02

DGAE provides low-variance advantage estimates with controlled bias.

03

The method improves policy gradient stability in stochastic systems.

Abstract

Generalized Advantage Estimation (GAE) has been used to mitigate the computational complexity of reinforcement learning (RL) by employing an exponentially weighted estimation of the advantage function to reduce the variance in policy gradient estimates. Despite its effectiveness, GAE is not designed to handle value distributions integral to distributional RL, which can capture the inherent stochasticity in systems and is hence more robust to system noises. To address this gap, we propose a novel approach that utilizes the optimal transport theory to introduce a Wasserstein-like directional metric, which measures both the distance and the directional discrepancies between probability distributions. Using the exponentially weighted estimation, we leverage this Wasserstein-like directional metric to derive distributional GAE (DGAE). Similar to traditional GAE, our proposed DGAE provides a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHealth Systems, Economic Evaluations, Quality of Life · Monetary Policy and Economic Impact · Advanced Causal Inference Techniques