Provably adaptive reinforcement learning in metric spaces
Tongyi Cao, Akshay Krishnamurthy

TL;DR
This paper introduces a new analysis of reinforcement learning algorithms in continuous metric spaces, demonstrating regret bounds that adapt to the problem's complexity via the zooming dimension, a concept from bandit theory.
Contribution
It provides the first provably adaptive regret guarantees for reinforcement learning in metric spaces, refining previous analyses by using the zooming dimension.
Findings
Regret scales with the zooming dimension of the instance.
The zooming dimension is smaller than the covering dimension, leading to tighter bounds.
First adaptive guarantees for RL in metric spaces.
Abstract
We study reinforcement learning in continuous state and action spaces endowed with a metric. We provide a refined analysis of a variant of the algorithm of Sinclair, Banerjee, and Yu (2019) and show that its regret scales with the \emph{zooming dimension} of the instance. This parameter, which originates in the bandit literature, captures the size of the subsets of near optimal actions and is always smaller than the covering dimension used in previous analyses. As such, our results are the first provably adaptive guarantees for reinforcement learning in metric spaces.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Data Stream Mining Techniques
