Provably Efficient Lifelong Reinforcement Learning with Linear Function Approximation
Sanae Amani, Lin F. Yang, Ching-An Cheng

TL;DR
This paper introduces UCBlvd, an algorithm for lifelong reinforcement learning with linear function approximation, achieving sublinear regret and efficient planning across multiple tasks without extensive computation.
Contribution
The paper proposes a novel algorithm, UCBlvd, that guarantees sublinear regret and minimal planning calls in lifelong RL with linear function approximation, under a new structural assumption.
Findings
Achieves regret bound of ( ilde{O}(\u00d7( ext{d}^3+ ext{d'} ext{d})H^4K))
Uses only H ext{log}(K) planning calls, enabling efficient learning
Supports rapid adaptation to new tasks in lifelong learning setting
Abstract
We study lifelong reinforcement learning (RL) in a regret minimization setting of linear contextual Markov decision process (MDP), where the agent needs to learn a multi-task policy while solving a streaming sequence of tasks. We propose an algorithm, called UCB Lifelong Value Distillation (UCBlvd), that provably achieves sublinear regret for any sequence of tasks, which may be adaptively chosen based on the agent's past behaviors. Remarkably, our algorithm uses only sublinear number of planning calls, which means that the agent eventually learns a policy that is near optimal for multiple tasks (seen or unseen) without the need of deliberate planning. A key to this property is a new structural assumption that enables computation sharing across tasks during exploration. Specifically, for task episodes of horizon , our algorithm has a regret bound…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI)
