The Online Coupon-Collector Problem and Its Application to Lifelong Reinforcement Learning
Emma Brunskill, Lihong Li

TL;DR
This paper introduces an online coupon-collector problem to analyze lifelong reinforcement learning, providing an optimal algorithm that significantly reduces sample complexity across multiple tasks, demonstrated through simulations including human-robot interaction.
Contribution
It formulates a novel online coupon-collector problem and develops an optimal lifelong RL algorithm with improved sample efficiency over single-task learning.
Findings
The algorithm achieves lower sample complexity in lifelong RL.
Demonstrated effectiveness in simulated human-robot interaction tasks.
Provides theoretical analysis for cross-task exploration in RL.
Abstract
Transferring knowledge across a sequence of related tasks is an important challenge in reinforcement learning (RL). Despite much encouraging empirical evidence, there has been little theoretical analysis. In this paper, we study a class of lifelong RL problems: the agent solves a sequence of tasks modeled as finite Markov decision processes (MDPs), each of which is from a finite set of MDPs with the same state/action sets and different transition/reward functions. Motivated by the need for cross-task exploration in lifelong learning, we formulate a novel online coupon-collector problem and give an optimal algorithm. This allows us to develop a new lifelong RL algorithm, whose overall sample complexity in a sequence of tasks is much smaller than single-task learning, even if the sequence of tasks is generated by an adversary. Benefits of the algorithm are demonstrated in simulated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Optimization and Search Problems · Advanced Bandit Algorithms Research
