Prioritized Generative Replay

Renhao Wang; Kevin Frans; Pieter Abbeel; Sergey Levine; Alexei A. Efros

arXiv:2410.18082·cs.LG·May 12, 2025

Prioritized Generative Replay

Renhao Wang, Kevin Frans, Pieter Abbeel, Sergey Levine, Alexei A. Efros

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a prioritized generative replay method for online reinforcement learning that uses generative models to create more relevant and diverse experiences, improving sample efficiency and reducing overfitting.

Contribution

It proposes a novel parametric memory using generative models with relevance guidance, enhancing experience densification and diversity in reinforcement learning.

Findings

01

Improves performance and sample efficiency in state- and pixel-based domains.

02

Guidance promotes diversity and reduces overfitting in generated transitions.

03

Enables training with higher update-to-data ratios.

Abstract

Sample-efficient online reinforcement learning often uses replay buffers to store experience for reuse when updating the value function. However, uniform replay is inefficient, since certain classes of transitions can be more relevant to learning. While prioritization of more useful samples is helpful, this strategy can also lead to overfitting, as useful samples are likely to be more rare. In this work, we instead propose a prioritized, parametric version of an agent's memory, using generative models to capture online experience. This paradigm enables (1) densification of past experience, with new generations that benefit from the generative model's generalization capacity and (2) guidance via a family of "relevance functions" that push these generations towards more useful parts of an agent's acquired history. We show this recipe can be instantiated using conditional diffusion models…

Peer Reviews

Decision·ICLR 2025 Oral

Reviewer 01Rating 8Confidence 4

Strengths

* This work proposes a scalable method for training model-free or model-based agents in a variety of domains. I believe the formulation is simple enough to be integrated into and improve other approaches. * I also found the presentation clear and easy to read. * I found the scaling experiments to be very compelling, I'm a little concerned about the general thrust of driving up the syn-real data ratio as high as possible, since we do need to ground the generations in real experience. But I st

Weaknesses

I have two points of contention with this work. 1. From a paradigm perspective, I don't understand how this is different from prior work in model-based RL that apples intrinsic rewards to a learned dynamics model [1] or world-model [2]. These methods also utilize a generative model as a copy of the environment, then train the agent in simulation to acquire interesting data (under the intrinsic reward). It seems that this method does the same, except that instances, rather than full trajectories

Reviewer 02Rating 8Confidence 3

Strengths

One of the strength of this paper are the clear and concise language as well as good structured presentation of the proposed method. It is quite logical to improve on the already existing prioritized experience replay method and implement it in the generative domain. The method is explained well and should be quite easily reproducable. Overall the research could be a valuable contribution to the reinforcement learning community.

Weaknesses

A topic i feel like missed somewhat are the different ways to approach generative replay such as mentions of other generative models (e.g. variational auto encoders, gaussian mixture models) and why they were not used. One thing i found rather off putting and this is very nitpicky is that the Tables 1, 2 and 3 are a bit crammed and slightly off from each other.

Reviewer 03Rating 6Confidence 3

Strengths

1. The paper is well written and provides a clear explanation of their method. 2. The research problem addressed in the paper is well laid out and is an important one to improve the performance of RL methods.

Weaknesses

1. While the method shows improved performance, it is a bit simple as it combines existing elements in diffusion models and RL to propose the solution. 2. It would be useful to compare the effect of different kinds of exploration bonuses.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Artificial Intelligence in Games

MethodsDiffusion