Same State, Different Task: Continual Reinforcement Learning without   Interference

Samuel Kessler; Jack Parker-Holder; Philip Ball; Stefan Zohren,; Stephen J. Roberts

arXiv:2106.02940·cs.LG·March 16, 2022

Same State, Different Task: Continual Reinforcement Learning without Interference

Samuel Kessler, Jack Parker-Holder, Philip Ball, Stefan Zohren,, Stephen J. Roberts

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces OWL, a method for continual reinforcement learning that prevents interference between tasks by using separate policy heads and bandit-based policy selection, outperforming existing replay methods.

Contribution

The paper formalizes interference as distinct from forgetting and proposes OWL, a factorized policy approach with bandit-based selection to address interference in continual RL.

Findings

01

OWL outperforms existing replay-based CL methods in multiple RL environments.

02

OWL effectively prevents interference between incompatible tasks.

03

Bandit-based policy selection enables optimal task-specific policy reuse.

Abstract

Continual Learning (CL) considers the problem of training an agent sequentially on a set of tasks while seeking to retain performance on all previous tasks. A key challenge in CL is catastrophic forgetting, which arises when performance on a previously mastered task is reduced when learning a new task. While a variety of methods exist to combat forgetting, in some cases tasks are fundamentally incompatible with each other and thus cannot be learnt by a single policy. This can occur, in reinforcement learning (RL) when an agent may be rewarded for achieving different goals from the same observation. In this paper we formalize this "interference" as distinct from the problem of forgetting. We show that existing CL methods based on single neural network predictors with shared replay buffers fail in the presence of interference. Instead, we propose a simple method, OWL, to address this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

skezle/owl
pytorchOfficial

Videos

Same State, Different Task: Continual Reinforcement Learning without Interference· underline

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications