Safe Langevin Soft Actor Critic
Mahesh Keswani, Samyak Jain, Raunak P. Bhattacharyya

TL;DR
SL-SAC is a novel reinforcement learning algorithm that improves safety and reward balance by combining parameter-space exploration, distributional risk control, and adaptive constraint enforcement, leading to better safety performance.
Contribution
It introduces a new algorithm integrating aSGLD, IQN with CVaR, and reactive Lagrangian relaxation for safer and more reliable constrained reinforcement learning.
Findings
Achieves lowest costs in 7 out of 10 Safety-Gymnasium tasks.
Reduces velocity task costs by 19-63% compared to baselines.
Provides theoretical guarantees on CVaR estimation error.
Abstract
Balancing reward and safety in constrained reinforcement learning remains challenging due to poor generalization from sharp value minima and inadequate handling of heavy-tailed risk distribution. We introduce Safe Langevin Soft Actor-Critic (SL-SAC), a principled algorithm that addresses both issues through parameter-space exploration and distributional risk control. Our approach combines three key mechanisms: (1) Adaptive Stochastic Gradient Langevin Dynamics (aSGLD) for reward critics, promoting ensemble diversity and escape from poor optima; (2) distributional cost estimation via Implicit Quantile Networks (IQN) with Conditional Value-at-Risk (CVaR) optimization for tail-risk mitigation; and (3) a reactive Lagrangian relaxation scheme that adapts constraint enforcement based on the empirical CVaR of episodic costs. We provide theoretical guarantees on CVaR estimation error and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques
