Replay For Safety

Liran Szlak; Ohad Shamir

arXiv:2112.04229·cs.LG·December 9, 2021

Replay For Safety

Liran Szlak, Ohad Shamir

PDF

Open Access

TL;DR

This paper investigates how experience replay sampling schemes affect the convergence and properties of policies in reinforcement learning, proposing biased replay as a means to achieve safer policies.

Contribution

It establishes conditions for convergence in tabular Q-learning and introduces biased replay as a method to control and improve policy safety.

Findings

01

Conditions for convergence with replay sampling schemes

02

Biased replay can modify policy properties

03

Potential for safer policies through replay biasing

Abstract

Experience replay \citep{lin1993reinforcement, mnih2015human} is a widely used technique to achieve efficient use of data and improved performance in RL algorithms. In experience replay, past transitions are stored in a memory buffer and re-used during learning. Various suggestions for sampling schemes from the replay buffer have been suggested in previous works, attempting to optimally choose those experiences which will most contribute to the convergence to an optimal policy. Here, we give some conditions on the replay sampling scheme that will ensure convergence, focusing on the well-known Q-learning algorithm in the tabular setting. After establishing sufficient conditions for convergence, we turn to suggest a slightly different usage for experience replay - replaying memories in a biased manner as a means to change the properties of the resulting policy. We initiate a rigorous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAge of Information Optimization · Stochastic Gradient Optimization Techniques · Ferroelectric and Negative Capacitance Devices

MethodsQ-Learning · Experience Replay