DSAC-C: Constrained Maximum Entropy for Robust Discrete Soft-Actor Critic

Dexter Neo; Tsuhan Chen

arXiv:2310.17173·cs.LG·June 24, 2025·1 cites

DSAC-C: Constrained Maximum Entropy for Robust Discrete Soft-Actor Critic

Dexter Neo, Tsuhan Chen

PDF

Open Access

TL;DR

This paper introduces DSAC-C, an extension of the Soft Actor-Critic algorithm that incorporates statistical constraints from a surrogate critic, enhancing robustness and performance in low-data and out-of-distribution scenarios.

Contribution

The paper proposes a novel constrained maximum entropy approach for discrete SAC, improving robustness and performance through additional statistical constraints derived from a surrogate critic.

Findings

01

Enhanced robustness against domain shifts

02

Improved performance in low-data regimes

03

Effective in out-of-distribution Atari experiments

Abstract

We present a novel extension to the family of Soft Actor-Critic (SAC) algorithms. We argue that based on the Maximum Entropy Principle, discrete SAC can be further improved via additional statistical constraints derived from a surrogate critic policy. Furthermore, our findings suggests that these constraints provide an added robustness against potential domain shifts, which are essential for safe deployment of reinforcement learning agents in the real-world. We provide theoretical analysis and show empirical results on low data regimes for both in-distribution and out-of-distribution variants of Atari 2600 games.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsAverage Pooling · Global Average Pooling · Dilated Convolution · 1x1 Convolution · Convolution · Switchable Atrous Convolution