TL;DR
This paper introduces DR-SAC, a novel distributionally robust actor-critic algorithm for offline continuous control in reinforcement learning, enhancing robustness and efficiency under environmental uncertainties.
Contribution
DR-SAC is the first actor-critic based distributionally robust RL algorithm for offline continuous control, with theoretical convergence guarantees and improved performance.
Findings
DR-SAC outperforms SAC baseline by up to 9.8 times in average reward under perturbations.
The algorithm improves computational efficiency and scalability for large-scale problems.
Distributionally robust soft policy iteration is derived with convergence guarantees.
Abstract
Deep reinforcement learning (RL) has achieved remarkable success, yet its deployment in real-world scenarios is often limited by vulnerability to environmental uncertainties. Distributionally robust RL (DR-RL) algorithms have been proposed to resolve this challenge, but existing approaches are largely restricted to value-based methods in tabular settings. In this work, we introduce Distributionally Robust Soft Actor-Critic (DR-SAC), the first actor-critic based DR-RL algorithm for offline learning in continuous action spaces. DR-SAC maximizes the entropy-regularized rewards against the worst possible transition models within an KL-divergence constrained uncertainty set. We derive the distributionally robust version of the soft policy iteration with a convergence guarantee and incorporate a generative modeling approach to estimate the unknown nominal transition models. Experiment results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
