Adversarial Online Multi-Task Reinforcement Learning

Quan Nguyen; Nishant A. Mehta

arXiv:2301.04268·cs.LG·January 12, 2023·1 cites

Adversarial Online Multi-Task Reinforcement Learning

Quan Nguyen, Nishant A. Mehta

PDF

Open Access 1 Repo

TL;DR

This paper studies adversarial online multi-task reinforcement learning, establishing fundamental lower bounds and proposing an algorithm that nearly matches these bounds, advancing understanding of task separation and sample efficiency.

Contribution

The paper introduces a new $ extit{2-JAO}$ MDP construction, derives tight lower bounds on regret and sample complexity, and presents a polynomial-time algorithm with near-optimal guarantees.

Findings

01

Lower bound of $oldsymbol{ extit{ extOmega}}(K extsqrt{DSAH})$ on regret.

02

Instance-specific lower bound of $oldsymbol{ extit{ extOmega}}(rac{K}{ extlambda^2})$ on sample complexity.

03

Proposed algorithm achieves near-optimal sample and regret bounds.

Abstract

We consider the adversarial online multi-task reinforcement learning setting, where in each of $K$ episodes the learner is given an unknown task taken from a finite set of $M$ unknown finite-horizon MDP models. The learner's objective is to minimize its regret with respect to the optimal policy for each task. We assume the MDPs in $M$ are well-separated under a notion of $λ$ -separability, and show that this notion generalizes many task-separability notions from previous works. We prove a minimax lower bound of $Ω (K D S A H)$ on the regret of any learning algorithm and an instance-specific lower bound of $Ω (\frac{K}{λ ^{2}})$ in sample complexity for a class of uniformly-good cluster-then-learn algorithms. We use a novel construction called 2-JAO MDP for proving the instance-specific lower bound. The lower bounds are complemented with a polynomial time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ngmq/adversarial-online-multi-task-reinforcement-learning
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Advanced Bandit Algorithms Research · Optimization and Search Problems