Replay across Experiments: A Natural Extension of Off-Policy RL

Dhruva Tirumala; Thomas Lampe; Jose Enrique Chen; Tuomas Haarnoja,; Sandy Huang; Guy Lever; Ben Moran; Tim Hertweck; Leonard Hasenclever; Martin; Riedmiller; Nicolas Heess; Markus Wulfmeier

arXiv:2311.15951·cs.LG·November 29, 2023·1 cites

Replay across Experiments: A Natural Extension of Off-Policy RL

Dhruva Tirumala, Thomas Lampe, Jose Enrique Chen, Tuomas Haarnoja,, Sandy Huang, Guy Lever, Ben Moran, Tim Hertweck, Leonard Hasenclever, Martin, Riedmiller, Nicolas Heess, Markus Wulfmeier

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Replay Across Experiments (RaE), a simple framework that reuses experience data across multiple RL experiments to improve efficiency, exploration, and robustness in various challenging control tasks.

Contribution

The paper presents a novel, minimal-adaptation method for reusing experience across experiments, enhancing RL performance and research efficiency.

Findings

01

RaE improves controller performance across multiple RL algorithms.

02

Reusing experience enhances exploration in challenging tasks.

03

RaE demonstrates robustness to data quality and hyperparameter variations.

Abstract

Replaying data is a principal mechanism underlying the stability and data efficiency of off-policy reinforcement learning (RL). We present an effective yet simple framework to extend the use of replays across multiple experiments, minimally adapting the RL workflow for sizeable improvements in controller performance and research iteration times. At its core, Replay Across Experiments (RaE) involves reusing experience from previous experiments to improve exploration and bootstrap learning while reducing required changes to a minimum in comparison to prior work. We empirically show benefits across a number of RL algorithms and challenging control domains spanning both locomotion and manipulation, including hard exploration tasks from egocentric vision. Through comprehensive ablations, we demonstrate robustness to the quality and amount of data available and various hyperparameter choices.…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

- The approach of keeping data from previous experiments seems to be very relevant in practical scenarios where our main goal is to train an agent with strong performance. In that case, given the cheap cost of memory, it would be sensible to keep data from previous runs for the benefit of future experiments. This seems like an understudied topic and it's great that this paper discusses it. - There's a nice variety of environments that are used, including some more complex ones with a good mix

Weaknesses

- In the current paper, most of the experiments run the same learning agent on the same environment with the RaE algorithm but the method is pitched as being helpful for boosting learning between different experiments with potential differences in experimental conditions. See Questions. - There are other simple algorithms for this across-experiment setting that would be interesting to investigate. See Questions.

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

- The method is simple to implement compared to many existing data-reuse RL methods. - The experiments in the paper are comprehensive, testing a multitude of diverse methods in a number of high-dimensional control environments. Several strong baselines from the literature are compared against. In spite of this simplicity, RaE can achieve strong performance in these tasks. The breadth of the results demonstrate the generality of RaE. - The paper is well organized and includes nice discussions for

Weaknesses

- The main insight of the work, while interesting, is a small contribution. As the authors note in the background section, other methods already use data stored from previous experiments, so the only novelty here is that *data is not discarded*. There is no theoretical analysis in the paper. Without significant novelty or theory, the paper depends solely on its empirical results. - Storing all previous data is memory intensive, which is why offline RL generally uses more complicated techniques t

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

- The authors present a very simple idea that leads to improved performance across a variety of environments. - The proposed method works well even with a small amount of prior data. - The proposed method works well even with low return offline data, which makes the method much more useful in practice.

Weaknesses

- It is not clear to me if "Total online steps" in the figures includes the steps from prior experiments or not, so I'm concerned about the fairness of the comparison between RaE and baselines. If the authors can clarify this point then I may be willing to raise my score.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robotic Locomotion and Control