Regret Minimization Experience Replay in Off-Policy Reinforcement   Learning

Xu-Hui Liu; Zhenghai Xue; Jing-Cheng Pang; Shengyi Jiang; Feng Xu,; Yang Yu

arXiv:2105.07253·cs.LG·November 10, 2021·5 cites

Regret Minimization Experience Replay in Off-Policy Reinforcement Learning

Xu-Hui Liu, Zhenghai Xue, Jing-Cheng Pang, Shengyi Jiang, Feng Xu,, Yang Yu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a theoretically grounded approach to experience replay prioritization in reinforcement learning, proposing new methods ReMERN and ReMERT that outperform existing algorithms on various benchmarks.

Contribution

It derives an optimal prioritization strategy from regret minimization, providing theoretical insights and two novel algorithms for improved sample reuse.

Findings

01

ReMERN and ReMERT outperform previous methods on MuJoCo, Atari, and Meta-World benchmarks.

02

Theoretically justified prioritization criteria improve policy return.

03

New methods effectively utilize hindsight TD error, on-policiness, and Q-value accuracy.

Abstract

In reinforcement learning, experience replay stores past samples for further reuse. Prioritized sampling is a promising technique to better utilize these samples. Previous criteria of prioritization include TD error, recentness and corrective feedback, which are mostly heuristically designed. In this work, we start from the regret minimization objective, and obtain an optimal prioritization strategy for Bellman update that can directly maximize the return of the policy. The theory suggests that data with higher hindsight TD error, better on-policiness and more accurate Q value should be assigned with higher weights during sampling. Thus most previous criteria only consider this strategy partially. We not only provide theoretical justifications for previous criteria, but also propose two new methods to compute the prioritization weight, namely ReMERN and ReMERT. ReMERN learns an error…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aidefender/remern-remert
noneOfficial

Videos

Regret Minimization Experience Replay in Off-Policy Reinforcement Learning· slideslive

Taxonomy

TopicsMental Health Research Topics · Functional Brain Connectivity Studies · Neural dynamics and brain function

MethodsExperience Replay