Provably Adaptive Average Reward Reinforcement Learning for Metric Spaces
Avik Kar, Rahul Singh

TL;DR
This paper introduces ZoRL, an adaptive reinforcement learning algorithm for Lipschitz MDPs that achieves better regret bounds by zooming into promising regions, outperforming fixed discretization methods.
Contribution
The paper develops ZoRL, an adaptive algorithm that improves regret bounds in average-reward RL for Lipschitz MDPs by dynamically discretizing the state-action space.
Findings
ZoRL achieves regret bounds of $ ilde{O}(T^{1 - d_{eff}^{-1}})$, improving over fixed discretization methods.
ZoRL outperforms state-of-the-art algorithms in experiments, demonstrating the benefits of adaptivity.
The algorithm effectively captures problem-specific structure through the zooming dimension, leading to smaller regret in benign MDPs.
Abstract
We study infinite-horizon average-reward reinforcement learning (RL) for Lipschitz MDPs, a broad class that subsumes several important classes such as linear and RKHS MDPs, function approximation frameworks, and develop an adaptive algorithm with regret bounded as , where , is the dimension of the state space and is the zooming dimension. In contrast, algorithms with fixed discretization yield , being the dimension of action space. achieves this by discretizing the state-action space adaptively and zooming into ''promising regions'' of the state-action space. , a problem-dependent quantity bounded by the state-action space's dimension, allows us to conclude that if an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSmart Parking Systems Research
