Spectral-Risk Safe Reinforcement Learning with Convergence Guarantees

Dohyeong Kim; Taehyun Cho; Seungyub Han; Hojun Chung; Kyungjae Lee,; Songhwai Oh

arXiv:2405.18698·cs.LG·May 30, 2024

Spectral-Risk Safe Reinforcement Learning with Convergence Guarantees

Dohyeong Kim, Taehyun Cho, Seungyub Han, Hojun Chung, Kyungjae Lee,, Songhwai Oh

PDF

Open Access 1 Video

TL;DR

This paper introduces SRCPO, a novel spectral risk measure-constrained reinforcement learning algorithm with convergence guarantees, effective in continuous control tasks and outperforming existing methods under risk constraints.

Contribution

It proposes the first convergence-guaranteed bilevel optimization algorithm for risk-constrained RL using spectral risk measures.

Findings

01

Achieves convergence to an optimum in tabular settings.

02

Outperforms other RCRL algorithms on continuous control tasks.

03

Demonstrates effective risk constraint satisfaction.

Abstract

The field of risk-constrained reinforcement learning (RCRL) has been developed to effectively reduce the likelihood of worst-case scenarios by explicitly handling risk-measure-based constraints. However, the nonlinearity of risk measures makes it challenging to achieve convergence and optimality. To overcome the difficulties posed by the nonlinearity, we propose a spectral risk measure-constrained RL algorithm, spectral-risk-constrained policy optimization (SRCPO), a bilevel optimization approach that utilizes the duality of spectral risk measures. In the bilevel optimization structure, the outer problem involves optimizing dual variables derived from the risk measures, while the inner problem involves finding an optimal policy given these dual variables. The proposed method, to the best of our knowledge, is the first to guarantee convergence to an optimum in the tabular setting.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Spectral-Risk Safe Reinforcement Learning with Convergence Guarantees· slideslive

Taxonomy

TopicsTraffic control and management