Provably Efficient Lifelong Reinforcement Learning with Linear Function   Approximation

Sanae Amani; Lin F. Yang; Ching-An Cheng

arXiv:2206.00270·cs.LG·June 2, 2022

Provably Efficient Lifelong Reinforcement Learning with Linear Function Approximation

Sanae Amani, Lin F. Yang, Ching-An Cheng

PDF

Open Access

TL;DR

This paper introduces UCBlvd, an algorithm for lifelong reinforcement learning with linear function approximation, achieving sublinear regret and efficient planning across multiple tasks without extensive computation.

Contribution

The paper proposes a novel algorithm, UCBlvd, that guarantees sublinear regret and minimal planning calls in lifelong RL with linear function approximation, under a new structural assumption.

Findings

01

Achieves regret bound of ( ilde{O}(\u00d7( ext{d}^3+ ext{d'} ext{d})H^4K))

02

Uses only H ext{log}(K) planning calls, enabling efficient learning

03

Supports rapid adaptation to new tasks in lifelong learning setting

Abstract

We study lifelong reinforcement learning (RL) in a regret minimization setting of linear contextual Markov decision process (MDP), where the agent needs to learn a multi-task policy while solving a streaming sequence of tasks. We propose an algorithm, called UCB Lifelong Value Distillation (UCBlvd), that provably achieves sublinear regret for any sequence of tasks, which may be adaptively chosen based on the agent's past behaviors. Remarkably, our algorithm uses only sublinear number of planning calls, which means that the agent eventually learns a policy that is near optimal for multiple tasks (seen or unseen) without the need of deliberate planning. A key to this property is a new structural assumption that enables computation sharing across tasks during exploration. Specifically, for $K$ task episodes of horizon $H$ , our algorithm has a regret bound…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI)