A Dirichlet Process Mixture of Robust Task Models for Scalable Lifelong   Reinforcement Learning

Zhi Wang; Chunlin Chen; Daoyi Dong

arXiv:2205.10787·cs.LG·May 24, 2022

A Dirichlet Process Mixture of Robust Task Models for Scalable Lifelong Reinforcement Learning

Zhi Wang, Chunlin Chen, Daoyi Dong

PDF

TL;DR

This paper introduces a scalable lifelong reinforcement learning approach using a Dirichlet process mixture model that dynamically expands and adapts to new tasks, preventing forgetting and improving generalization.

Contribution

The paper proposes a non-parametric Bayesian framework for lifelong RL that automatically adjusts model complexity and clusters tasks without explicit boundaries or heuristics.

Findings

01

Outperforms existing methods in robot navigation and locomotion tasks.

02

Effectively prevents catastrophic forgetting in lifelong learning.

03

Demonstrates scalable adaptation to non-stationary task distributions.

Abstract

While reinforcement learning (RL) algorithms are achieving state-of-the-art performance in various challenging tasks, they can easily encounter catastrophic forgetting or interference when faced with lifelong streaming information. In the paper, we propose a scalable lifelong RL method that dynamically expands the network capacity to accommodate new knowledge while preventing past memories from being perturbed. We use a Dirichlet process mixture to model the non-stationary task distribution, which captures task relatedness by estimating the likelihood of task-to-cluster assignments and clusters the task models in a latent space. We formulate the prior distribution of the mixture as a Chinese restaurant process (CRP) that instantiates new mixture components as needed. The update and expansion of the mixture are governed by the Bayesian non-parametric framework with an expectation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.