Same State, Different Task: Continual Reinforcement Learning without Interference
Samuel Kessler, Jack Parker-Holder, Philip Ball, Stefan Zohren,, Stephen J. Roberts

TL;DR
This paper introduces OWL, a method for continual reinforcement learning that prevents interference between tasks by using separate policy heads and bandit-based policy selection, outperforming existing replay methods.
Contribution
The paper formalizes interference as distinct from forgetting and proposes OWL, a factorized policy approach with bandit-based selection to address interference in continual RL.
Findings
OWL outperforms existing replay-based CL methods in multiple RL environments.
OWL effectively prevents interference between incompatible tasks.
Bandit-based policy selection enables optimal task-specific policy reuse.
Abstract
Continual Learning (CL) considers the problem of training an agent sequentially on a set of tasks while seeking to retain performance on all previous tasks. A key challenge in CL is catastrophic forgetting, which arises when performance on a previously mastered task is reduced when learning a new task. While a variety of methods exist to combat forgetting, in some cases tasks are fundamentally incompatible with each other and thus cannot be learnt by a single policy. This can occur, in reinforcement learning (RL) when an agent may be rewarded for achieving different goals from the same observation. In this paper we formalize this "interference" as distinct from the problem of forgetting. We show that existing CL methods based on single neural network predictors with shared replay buffers fail in the presence of interference. Instead, we propose a simple method, OWL, to address this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
