Wasserstein Adaptive Value Estimation for Actor-Critic Reinforcement   Learning

Ali Baheri; Zahra Shahrooei; and Chirayu Salgarkar

arXiv:2501.10605·cs.LG·March 10, 2025

Wasserstein Adaptive Value Estimation for Actor-Critic Reinforcement Learning

Ali Baheri, Zahra Shahrooei, and Chirayu Salgarkar

PDF

Open Access

TL;DR

WAVE introduces an adaptive Wasserstein regularization technique for actor-critic reinforcement learning, improving stability and convergence with theoretical guarantees and empirical success.

Contribution

It proposes a novel adaptive Wasserstein regularization method for actor-critic algorithms, with proven convergence rates and enhanced stability.

Findings

01

WAVE achieves $ ext{O}(1/k)$ convergence rate.

02

The method improves stability over standard actor-critic algorithms.

03

Experimental results show superior performance of WAVE.

Abstract

We present Wasserstein Adaptive Value Estimation for Actor-Critic (WAVE), an approach to enhance stability in deep reinforcement learning through adaptive Wasserstein regularization. Our method addresses the inherent instability of actor-critic algorithms by incorporating an adaptively weighted Wasserstein regularization term into the critic's loss function. We prove that WAVE achieves $O (\frac{1}{k})$ convergence rate for the critic's mean squared error and provide theoretical guarantees for stability through Wasserstein-based regularization. Using the Sinkhorn approximation for computational efficiency, our approach automatically adjusts the regularization based on the agent's performance. Theoretical analysis and experimental results demonstrate that WAVE achieves superior performance compared to standard actor-critic methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenetic Neurodegenerative Diseases · Ecosystem dynamics and resilience