Improved Regret Bound for Safe Reinforcement Learning via Tighter Cost Pessimism and Reward Optimism
Kihyun Yu, Duksang Lee, William Overman, Dabeen Lee

TL;DR
This paper introduces a new model-based safe reinforcement learning algorithm with tighter cost and reward estimators, achieving improved regret bounds while ensuring no constraint violations, and nearly matching the theoretical lower bounds in certain settings.
Contribution
It proposes novel cost and reward estimators based on a Bellman-type law of total variance, leading to tighter regret bounds and improved theoretical guarantees in safe RL.
Findings
Achieves a regret upper bound of $ ilde{O}((ar C - ar C_b)^{-1}H^{2.5} S oot{A}K)$
Nearly matches the regret lower bound when $ar C - ar C_b= ext{Omega}(H)$
Demonstrates computational effectiveness through numerical experiments
Abstract
This paper studies the safe reinforcement learning problem formulated as an episodic finite-horizon tabular constrained Markov decision process with an unknown transition kernel and stochastic reward and cost functions. We propose a model-based algorithm based on novel cost and reward function estimators that provide tighter cost pessimism and reward optimism. While guaranteeing no constraint violation in every episode, our algorithm achieves a regret upper bound of where is the cost budget for an episode, is the expected cost under a safe baseline policy over an episode, is the horizon, and , and are the number of states, actions, and episodes, respectively. This improves upon the best-known regret upper bound, and when , it nearly matches the regret…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Traffic control and management
