Wasserstein Barycenter Soft Actor-Critic
Zahra Shahrooei, Ali Baheri

TL;DR
This paper introduces WBSAC, a reinforcement learning algorithm that combines pessimistic and optimistic policies via Wasserstein barycenter to improve sample efficiency in continuous control tasks.
Contribution
WBSAC is the first to use Wasserstein barycenter for combining exploration and exploitation policies in off-policy actor-critic algorithms.
Findings
WBSAC outperforms state-of-the-art algorithms on MuJoCo tasks.
WBSAC achieves higher sample efficiency in environments with sparse rewards.
The Wasserstein barycenter approach effectively balances exploration and exploitation.
Abstract
Deep off-policy actor-critic algorithms have emerged as the leading framework for reinforcement learning in continuous control domains. However, most of these algorithms suffer from poor sample efficiency, especially in environments with sparse rewards. In this paper, we take a step towards addressing this issue by providing a principled directed exploration strategy. We propose Wasserstein Barycenter Soft Actor-Critic (WBSAC) algorithm, which benefits from a pessimistic actor for temporal difference learning and an optimistic actor to promote exploration. This is achieved by using the Wasserstein barycenter of the pessimistic and optimistic policies as the exploration policy and adjusting the degree of exploration throughout the learning process. We compare WBSAC with state-of-the-art off-policy actor-critic algorithms and show that WBSAC is more sample-efficient on MuJoCo continuous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Adaptive Dynamic Programming Control
