Wasserstein Barycenter Soft Actor-Critic

Zahra Shahrooei; Ali Baheri

arXiv:2506.10167·cs.LG·February 25, 2026

Wasserstein Barycenter Soft Actor-Critic

Zahra Shahrooei, Ali Baheri

PDF

Open Access

TL;DR

This paper introduces WBSAC, a reinforcement learning algorithm that combines pessimistic and optimistic policies via Wasserstein barycenter to improve sample efficiency in continuous control tasks.

Contribution

WBSAC is the first to use Wasserstein barycenter for combining exploration and exploitation policies in off-policy actor-critic algorithms.

Findings

01

WBSAC outperforms state-of-the-art algorithms on MuJoCo tasks.

02

WBSAC achieves higher sample efficiency in environments with sparse rewards.

03

The Wasserstein barycenter approach effectively balances exploration and exploitation.

Abstract

Deep off-policy actor-critic algorithms have emerged as the leading framework for reinforcement learning in continuous control domains. However, most of these algorithms suffer from poor sample efficiency, especially in environments with sparse rewards. In this paper, we take a step towards addressing this issue by providing a principled directed exploration strategy. We propose Wasserstein Barycenter Soft Actor-Critic (WBSAC) algorithm, which benefits from a pessimistic actor for temporal difference learning and an optimistic actor to promote exploration. This is achieved by using the Wasserstein barycenter of the pessimistic and optimistic policies as the exploration policy and adjusting the degree of exploration throughout the learning process. We compare WBSAC with state-of-the-art off-policy actor-critic algorithms and show that WBSAC is more sample-efficient on MuJoCo continuous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Adaptive Dynamic Programming Control