Policy Zooming: Adaptive Discretization-based Infinite-Horizon Average-Reward Reinforcement Learning

Avik Kar; Rahul Singh

arXiv:2405.18793·cs.LG·November 18, 2025

Policy Zooming: Adaptive Discretization-based Infinite-Horizon Average-Reward Reinforcement Learning

Avik Kar, Rahul Singh

PDF

Open Access 1 Video

TL;DR

This paper introduces adaptive algorithms for infinite-horizon average-reward reinforcement learning in continuous spaces, leveraging zooming techniques to efficiently explore policy spaces and achieve low regret, especially in low-complexity scenarios.

Contribution

The paper proposes novel zooming-based algorithms for RL that adaptively focus on promising policies, providing regret bounds dependent on a new complexity measure called zooming dimension.

Findings

01

Regret bounds scale with the inverse of the effective dimension.

02

Algorithms perform well in low-complexity or benign problem instances.

03

Specialization yields $ ilde{O}( oot{T})$ regret under certain conditions.

Abstract

We study the infinite-horizon average-reward reinforcement learning (RL) for continuous space Lipschitz MDPs in which an agent can play policies from a given set $Φ$ . The proposed algorithms efficiently explore the policy space by ''zooming'' into the ''promising regions'' of $Φ$ , thereby achieving adaptivity gains in the performance. We upper bound their regret as $\tilde{O} (T^{1 - d_{eff.}^{- 1}})$ , where $d_{eff.} = d_{z}^{Φ} + 2$ for model-free algoritahm $PZRL-MF$ and $d_{eff.} = 2 d_{S} + d_{z}^{Φ} + 3$ for model-based algorithm $PZRL-MB$ . Here, $d_{S}$ is the dimension of the state space, and $d_{z}^{Φ}$ is the zooming dimension given a set of policies $Φ$ . $d_{z}^{Φ}$ is an alternative measure of the complexity of the problem, and it depends on the underlying MDP as well as on $Φ$ . Hence, the proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Policy Zooming: Adaptive Discretization-based Infinite-Horizon Average-Reward Reinforcement Learning· underline

Taxonomy

TopicsOptimization and Variational Analysis · Smart Parking Systems Research · Distributed Control Multi-Agent Systems

MethodsSparse Evolutionary Training