DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under Uncertainty

Mingxuan Cui; Duo Zhou; Yuxuan Han; Grani A. Hanasusanto; Qiong Wang; Huan Zhang; Zhengyuan Zhou

arXiv:2506.12622·cs.LG·April 21, 2026

DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under Uncertainty

Mingxuan Cui, Duo Zhou, Yuxuan Han, Grani A. Hanasusanto, Qiong Wang, Huan Zhang, Zhengyuan Zhou

PDF

1 Repo 1 Video

TL;DR

This paper introduces DR-SAC, a novel distributionally robust actor-critic algorithm for offline continuous control in reinforcement learning, enhancing robustness and efficiency under environmental uncertainties.

Contribution

DR-SAC is the first actor-critic based distributionally robust RL algorithm for offline continuous control, with theoretical convergence guarantees and improved performance.

Findings

01

DR-SAC outperforms SAC baseline by up to 9.8 times in average reward under perturbations.

02

The algorithm improves computational efficiency and scalability for large-scale problems.

03

Distributionally robust soft policy iteration is derived with convergence guarantees.

Abstract

Deep reinforcement learning (RL) has achieved remarkable success, yet its deployment in real-world scenarios is often limited by vulnerability to environmental uncertainties. Distributionally robust RL (DR-RL) algorithms have been proposed to resolve this challenge, but existing approaches are largely restricted to value-based methods in tabular settings. In this work, we introduce Distributionally Robust Soft Actor-Critic (DR-SAC), the first actor-critic based DR-RL algorithm for offline learning in continuous action spaces. DR-SAC maximizes the entropy-regularized rewards against the worst possible transition models within an KL-divergence constrained uncertainty set. We derive the distributionally robust version of the soft policy iteration with a convergence guarantee and incorporate a generative modeling approach to estimate the unknown nominal transition models. Experiment results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Lemutisme/DR-SAC
github

Videos

DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under Uncertainty· slideslive