Achieving $\varepsilon^{-2}$ Dependence for Average-Reward Q-Learning with a New Contraction Principle
Zijun Chen, Zaiwei Chen, Nian Si, Shengbo Wang

TL;DR
This paper introduces a new contraction principle for average-reward Q-learning, achieving optimal sample complexity of approximately epsilon^{-2} without strong assumptions, by transforming the MDP to ensure contraction under a novel seminorm.
Contribution
It develops a new contraction principle for average-reward Q-learning using a lazy transformation, enabling optimal convergence rates without strong assumptions.
Findings
Achieves (\u03b5^{-2}) sample complexity for average-reward Q-learning.
Introduces a new instance-dependent seminorm for analysis.
Establishes contraction of the Bellman operator after a lazy transformation.
Abstract
We present the convergence rates of synchronous and asynchronous Q-learning for average-reward Markov decision processes, where the absence of contraction poses a fundamental challenge. Existing non-asymptotic results overcome this challenge by either imposing strong assumptions to enforce seminorm contraction or relying on discounted or episodic Markov decision processes as successive approximations, which either require unknown parameters or result in suboptimal sample complexity. In this work, under a reachability assumption, we establish optimal sample complexity guarantees (up to logarithmic factors) for a simple variant of synchronous and asynchronous Q-learning that samples from the lazified dynamics, where the system remains in the current state with some fixed probability. At the core of our analysis is the construction of an instance-dependent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Robot Manipulation and Learning
