Constrained Policy Improvement for Safe and Efficient Reinforcement   Learning

Elad Sarafian; Aviv Tamar; Sarit Kraus

arXiv:1805.07805·cs.LG·July 12, 2019·1 cites

Constrained Policy Improvement for Safe and Efficient Reinforcement Learning

Elad Sarafian, Aviv Tamar, Sarit Kraus

PDF

Open Access 1 Repo

TL;DR

This paper introduces Rerouted Behavior Improvement (RBI), a policy improvement method for RL that accounts for Q-function errors, reducing negative policy shifts and enhancing safety and data efficiency in batch and iterative learning scenarios.

Contribution

The paper presents RBI, a novel policy improvement algorithm that mitigates the impact of Q-value estimation errors, improving safety and efficiency in reinforcement learning.

Findings

01

RBI avoids catastrophic performance degradation.

02

RBI increases data efficiency in high-variance actions.

03

RBI outperforms greedy and other constrained algorithms in experiments.

Abstract

We propose a policy improvement algorithm for Reinforcement Learning (RL) which is called Rerouted Behavior Improvement (RBI). RBI is designed to take into account the evaluation errors of the Q-function. Such errors are common in RL when learning the $Q$ -value from finite past experience data. Greedy policies or even constrained policy optimization algorithms which ignore these errors may suffer from an improvement penalty (i.e. a negative policy improvement). To minimize the improvement penalty, the RBI idea is to attenuate rapid policy changes of low probability actions which were less frequently sampled. This approach is shown to avoid catastrophic performance degradation and reduce regret when learning from a batch of past experience. Through a two-armed bandit with Gaussian distributed rewards example, we show that it also increases data efficiency when the optimal action has a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

eladsar/rbi
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management