Experience Replay for Continual Learning
David Rolnick, Arun Ahuja, Jonathan Schwarz, Timothy P. Lillicrap,, Greg Wayne

TL;DR
This paper investigates the use of experience replay buffers to mitigate catastrophic forgetting in continual reinforcement learning without explicit task boundary signals, demonstrating effectiveness in Atari and DMLab environments.
Contribution
It introduces a simple, general approach using experience replay buffers for continual learning that performs well without requiring task labels or boundaries.
Findings
Replay buffers reduce catastrophic forgetting effectively.
Limited buffer size with random data discarding performs nearly as well as unlimited buffers.
Method matches performance of task-aware approaches in reinforcement learning environments.
Abstract
Continual learning is the problem of learning new tasks or knowledge while protecting old knowledge and ideally generalizing from old experience to learn new tasks faster. Neural networks trained by stochastic gradient descent often degrade on old tasks when trained successively on new tasks with different data distributions. This phenomenon, referred to as catastrophic forgetting, is considered a major hurdle to learning with non-stationary data or sequences of new tasks, and prevents networks from continually accumulating knowledge and skills. We examine this issue in the context of reinforcement learning, in a setting where an agent is exposed to tasks in a sequence. Unlike most other work, we do not provide an explicit indication to the model of task boundaries, which is the most general circumstance for a learning agent exposed to continuous experience. While various methods to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Bandit Algorithms Research · Online Learning and Analytics
MethodsExperience Replay
