Policy Certificates: Towards Accountable Reinforcement Learning

Christoph Dann; Lihong Li; Wei Wei; Emma Brunskill

arXiv:1811.03056·cs.LG·May 29, 2019·19 cites

Policy Certificates: Towards Accountable Reinforcement Learning

Christoph Dann, Lihong Li, Wei Wei, Emma Brunskill

PDF

Open Access

TL;DR

This paper introduces policy certificates for reinforcement learning, providing guarantees on policy quality to enhance accountability, especially in high-stakes settings, and presents algorithms with theoretical guarantees and improved sample efficiency.

Contribution

It proposes a novel framework for policy certificates in RL, introduces two new algorithms with certificates, and offers theoretical analysis ensuring policy quality and sample efficiency.

Findings

01

Certificates can improve sample efficiency in tabular MDPs

02

First algorithms to achieve minimax-optimal PAC bounds with certificates

03

Matching or surpassing existing minimax regret bounds

Abstract

The performance of a reinforcement learning algorithm can vary drastically during learning because of exploration. Existing algorithms provide little information about the quality of their current policy before executing it, and thus have limited use in high-stakes applications like healthcare. We address this lack of accountability by proposing that algorithms output policy certificates. These certificates bound the sub-optimality and return of the policy in the next episode, allowing humans to intervene when the certified quality is not satisfactory. We further introduce two new algorithms with certificates and present a new framework for theoretical analysis that guarantees the quality of their policies and certificates. For tabular MDPs, we show that computing certificates can even improve the sample-efficiency of optimism-based exploration. As a result, one of our algorithms is the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning