Provably adaptive reinforcement learning in metric spaces

Tongyi Cao; Akshay Krishnamurthy

arXiv:2006.10875·cs.LG·October 22, 2021·1 cites

Provably adaptive reinforcement learning in metric spaces

Tongyi Cao, Akshay Krishnamurthy

PDF

Open Access 1 Video

TL;DR

This paper introduces a new analysis of reinforcement learning algorithms in continuous metric spaces, demonstrating regret bounds that adapt to the problem's complexity via the zooming dimension, a concept from bandit theory.

Contribution

It provides the first provably adaptive regret guarantees for reinforcement learning in metric spaces, refining previous analyses by using the zooming dimension.

Findings

01

Regret scales with the zooming dimension of the instance.

02

The zooming dimension is smaller than the covering dimension, leading to tighter bounds.

03

First adaptive guarantees for RL in metric spaces.

Abstract

We study reinforcement learning in continuous state and action spaces endowed with a metric. We provide a refined analysis of a variant of the algorithm of Sinclair, Banerjee, and Yu (2019) and show that its regret scales with the \emph{zooming dimension} of the instance. This parameter, which originates in the bandit literature, captures the size of the subsets of near optimal actions and is always smaller than the covering dimension used in previous analyses. As such, our results are the first provably adaptive guarantees for reinforcement learning in metric spaces.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Provably adaptive reinforcement learning in metric spaces· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Data Stream Mining Techniques