Assured RL: Reinforcement Learning with Almost Sure Constraints

Agustin Castellano; Juan Bazerque; Enrique Mallada

arXiv:2012.13036·cs.LG·December 25, 2020

Assured RL: Reinforcement Learning with Almost Sure Constraints

Agustin Castellano, Juan Bazerque, Enrique Mallada

PDF

Open Access

TL;DR

This paper introduces a Barrier-learning algorithm for reinforcement learning that ensures policies satisfy almost sure constraints by identifying unsafe state-action pairs using a damage function, enhancing existing RL methods.

Contribution

The paper proposes a novel Barrier-learning algorithm based on Q-Learning that incorporates a damage function to handle almost sure constraints in RL, enabling model-free feasibility enforcement.

Findings

01

The Barrier-learning algorithm effectively identifies unsafe state-action pairs.

02

It can be integrated with existing RL algorithms like Q-Learning and SARSA.

03

The approach guarantees almost sure constraint satisfaction in policy learning.

Abstract

We consider the problem of finding optimal policies for a Markov Decision Process with almost sure constraints on state transitions and action triplets. We define value and action-value functions that satisfy a barrier-based decomposition which allows for the identification of feasible policies independently of the reward process. We prove that, given a policy {\pi}, certifying whether certain state-action pairs lead to feasible trajectories under {\pi} is equivalent to solving an auxiliary problem aimed at finding the probability of performing an unfeasible transition. Using this interpretation,we develop a Barrier-learning algorithm, based on Q-Learning, that identifies such unsafe state-action pairs. Our analysis motivates the need to enhance the Reinforcement Learning (RL) framework with an additional signal, besides rewards, called here damage function that provides feasibility…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Formal Methods in Verification

MethodsQ-Learning