S$^2$AC: Energy-Based Reinforcement Learning with Stein Soft Actor   Critic

Safa Messaoud; Billel Mokeddem; Zhenghai Xue; Linsey Pang; Bo An,; Haipeng Chen; Sanjay Chawla

arXiv:2405.00987·cs.LG·May 3, 2024·2 cites

S$^2$AC: Energy-Based Reinforcement Learning with Stein Soft Actor Critic

Safa Messaoud, Billel Mokeddem, Zhenghai Xue, Linsey Pang, Bo An,, Haipeng Chen, Sanjay Chawla

PDF

Open Access 1 Repo

TL;DR

This paper introduces S$^2$AC, a novel energy-based reinforcement learning algorithm that efficiently learns expressive stochastic policies using Stein variational methods, improving performance over existing MaxEnt RL algorithms.

Contribution

S$^2$AC derives a closed-form entropy expression for Stein variational policies, enabling efficient and expressive MaxEnt RL without high computational costs.

Findings

01

S$^2$AC outperforms SAC and SQL in multi-goal environments.

02

S$^2$AC achieves better results on MuJoCo benchmarks.

03

The entropy formula simplifies policy evaluation in energy-based RL.

Abstract

Learning expressive stochastic policies instead of deterministic ones has been proposed to achieve better stability, sample complexity, and robustness. Notably, in Maximum Entropy Reinforcement Learning (MaxEnt RL), the policy is modeled as an expressive Energy-Based Model (EBM) over the Q-values. However, this formulation requires the estimation of the entropy of such EBMs, which is an open problem. To address this, previous MaxEnt RL methods either implicitly estimate the entropy, resulting in high computational complexity and variance (SQL), or follow a variational inference procedure that fits simplified actor distributions (e.g., Gaussian) for tractability (SAC). We propose Stein Soft Actor-Critic (S $^{2}$ AC), a MaxEnt RL algorithm that learns expressive policies without compromising efficiency. Specifically, S $^{2}$ AC uses parameterized Stein Variational Gradient Descent (SVGD) as the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

safamessaoud/s2ac-energy-based-rl-with-stein-soft-actor-critic
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsDilated Convolution · Convolution · Average Pooling · Global Average Pooling · 1x1 Convolution · Switchable Atrous Convolution · Variational Inference