POMRL: No-Regret Learning-to-Plan with Increasing Horizons

Khimya Khetarpal; Claire Vernade; Brendan O'Donoghue; Satinder Singh,; Tom Zahavy

arXiv:2212.14530·cs.AI·January 2, 2023

POMRL: No-Regret Learning-to-Plan with Increasing Horizons

Khimya Khetarpal, Claire Vernade, Brendan O'Donoghue, Satinder Singh,, Tom Zahavy

PDF

Open Access

TL;DR

This paper introduces POMRL, a meta-reinforcement learning algorithm that learns to plan with increasing horizons, leveraging experience across tasks to improve planning efficiency and reduce regret.

Contribution

It proposes a novel meta-learning approach for planning under model uncertainty, with theoretical regret bounds and heuristics for adaptive planning horizons.

Findings

01

Regret decreases as the number of tasks increases and tasks are more similar.

02

Heuristics for increasing planning horizons improve empirical performance.

03

Theoretical analysis links planning horizon to model accuracy and task data.

Abstract

We study the problem of planning under model uncertainty in an online meta-reinforcement learning (RL) setting where an agent is presented with a sequence of related tasks with limited interactions per task. The agent can use its experience in each task and across tasks to estimate both the transition model and the distribution over tasks. We propose an algorithm to meta-learn the underlying structure across tasks, utilize it to plan in each task, and upper-bound the regret of the planning loss. Our bound suggests that the average regret over tasks decreases as the number of tasks increases and as the tasks are more similar. In the classical single-task setting, it is known that the planning horizon should depend on the estimated model's accuracy, that is, on the number of samples within task. We generalize this finding to meta-RL and study this dependence of planning horizons on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Reinforcement Learning in Robotics