Policy Zooming: Adaptive Discretization-based Infinite-Horizon Average-Reward Reinforcement Learning
Avik Kar, Rahul Singh

TL;DR
This paper introduces adaptive algorithms for infinite-horizon average-reward reinforcement learning in continuous spaces, leveraging zooming techniques to efficiently explore policy spaces and achieve low regret, especially in low-complexity scenarios.
Contribution
The paper proposes novel zooming-based algorithms for RL that adaptively focus on promising policies, providing regret bounds dependent on a new complexity measure called zooming dimension.
Findings
Regret bounds scale with the inverse of the effective dimension.
Algorithms perform well in low-complexity or benign problem instances.
Specialization yields $ ilde{O}( oot{T})$ regret under certain conditions.
Abstract
We study the infinite-horizon average-reward reinforcement learning (RL) for continuous space Lipschitz MDPs in which an agent can play policies from a given set . The proposed algorithms efficiently explore the policy space by ''zooming'' into the ''promising regions'' of , thereby achieving adaptivity gains in the performance. We upper bound their regret as , where for model-free algoritahm and for model-based algorithm . Here, is the dimension of the state space, and is the zooming dimension given a set of policies . is an alternative measure of the complexity of the problem, and it depends on the underlying MDP as well as on . Hence, the proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsOptimization and Variational Analysis · Smart Parking Systems Research · Distributed Control Multi-Agent Systems
MethodsSparse Evolutionary Training
