Safe Option-Critic: Learning Safety in the Option-Critic Architecture

Arushi Jain; Khimya Khetarpal; Doina Precup

arXiv:1807.08060·cs.AI·July 1, 2021

Safe Option-Critic: Learning Safety in the Option-Critic Architecture

Arushi Jain, Khimya Khetarpal, Doina Precup

PDF

1 Repo

TL;DR

This paper introduces Safe Option-Critic, a reinforcement learning method that learns safe hierarchical policies by balancing reward maximization with uncertainty minimization, demonstrated across various environments.

Contribution

It proposes a novel objective and policy gradient algorithm for learning safe options within the options framework, emphasizing safety through uncertainty reduction.

Findings

01

Reduces variance of return in tested environments.

02

Improves performance in environments with variable rewards.

03

Outperforms primitive actions and risk-neutral options.

Abstract

Designing hierarchical reinforcement learning algorithms that exhibit safe behaviour is not only vital for practical applications but also, facilitates a better understanding of an agent's decisions. We tackle this problem in the options framework, a particular way to specify temporally abstract actions which allow an agent to use sub-policies with start and end conditions. We consider a behaviour as safe that avoids regions of state-space with high uncertainty in the outcomes of actions. We propose an optimization objective that learns safe options by encouraging the agent to visit states with higher behavioural consistency. The proposed objective results in a trade-off between maximizing the standard expected return and minimizing the effect of model uncertainty in the return. We propose a policy gradient algorithm to optimize the constrained objective function. We examine the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

arushi12130/SafeOptionCritic
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.