TL;DR
This paper introduces Safe Option-Critic, a reinforcement learning method that learns safe hierarchical policies by balancing reward maximization with uncertainty minimization, demonstrated across various environments.
Contribution
It proposes a novel objective and policy gradient algorithm for learning safe options within the options framework, emphasizing safety through uncertainty reduction.
Findings
Reduces variance of return in tested environments.
Improves performance in environments with variable rewards.
Outperforms primitive actions and risk-neutral options.
Abstract
Designing hierarchical reinforcement learning algorithms that exhibit safe behaviour is not only vital for practical applications but also, facilitates a better understanding of an agent's decisions. We tackle this problem in the options framework, a particular way to specify temporally abstract actions which allow an agent to use sub-policies with start and end conditions. We consider a behaviour as safe that avoids regions of state-space with high uncertainty in the outcomes of actions. We propose an optimization objective that learns safe options by encouraging the agent to visit states with higher behavioural consistency. The proposed objective results in a trade-off between maximizing the standard expected return and minimizing the effect of model uncertainty in the return. We propose a policy gradient algorithm to optimize the constrained objective function. We examine the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
