Constrained Policy Improvement for Safe and Efficient Reinforcement Learning
Elad Sarafian, Aviv Tamar, Sarit Kraus

TL;DR
This paper introduces Rerouted Behavior Improvement (RBI), a policy improvement method for RL that accounts for Q-function errors, reducing negative policy shifts and enhancing safety and data efficiency in batch and iterative learning scenarios.
Contribution
The paper presents RBI, a novel policy improvement algorithm that mitigates the impact of Q-value estimation errors, improving safety and efficiency in reinforcement learning.
Findings
RBI avoids catastrophic performance degradation.
RBI increases data efficiency in high-variance actions.
RBI outperforms greedy and other constrained algorithms in experiments.
Abstract
We propose a policy improvement algorithm for Reinforcement Learning (RL) which is called Rerouted Behavior Improvement (RBI). RBI is designed to take into account the evaluation errors of the Q-function. Such errors are common in RL when learning the -value from finite past experience data. Greedy policies or even constrained policy optimization algorithms which ignore these errors may suffer from an improvement penalty (i.e. a negative policy improvement). To minimize the improvement penalty, the RBI idea is to attenuate rapid policy changes of low probability actions which were less frequently sampled. This approach is shown to avoid catastrophic performance degradation and reduce regret when learning from a batch of past experience. Through a two-armed bandit with Gaussian distributed rewards example, we show that it also increases data efficiency when the optimal action has a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management
