Test-Time Regret Minimization in Meta Reinforcement Learning

Mirco Mutti; Aviv Tamar

arXiv:2406.02282·cs.LG·June 5, 2024

Test-Time Regret Minimization in Meta Reinforcement Learning

Mirco Mutti, Aviv Tamar

PDF

Open Access

TL;DR

This paper investigates the fundamental limits and conditions for minimizing regret in meta reinforcement learning, establishing lower bounds and proposing assumptions for faster learning rates in test tasks.

Contribution

It introduces a nearly optimal lower bound for test-time regret and proposes stronger assumptions that enable faster regret minimization rates.

Findings

01

Lower bound shows linear dependence on number of tasks is unavoidable.

02

Strong identifiability assumptions enable logarithmic regret rates.

03

Provides insights into statistical barriers of test-time regret minimization.

Abstract

Meta reinforcement learning sets a distribution over a set of tasks on which the agent can train at will, then is asked to learn an optimal policy for any test task efficiently. In this paper, we consider a finite set of tasks modeled through Markov decision processes with various dynamics. We assume to have endured a long training phase, from which the set of tasks is perfectly recovered, and we focus on regret minimization against the optimal policy in the unknown test task. Under a separation condition that states the existence of a state-action pair revealing a task against another, Chen et al. (2022) show that $O (M^{2} lo g (H))$ regret can be achieved, where $M, H$ are the number of tasks in the set and test episodes, respectively. In our first contribution, we demonstrate that the latter rate is nearly optimal by developing a novel lower bound for test-time regret minimization under…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Control Systems Optimization · Reinforcement Learning in Robotics · Machine Learning and ELM

MethodsSparse Evolutionary Training · Focus