Achieving $\varepsilon^{-2}$ Dependence for Average-Reward Q-Learning with a New Contraction Principle

Zijun Chen; Zaiwei Chen; Nian Si; Shengbo Wang

arXiv:2601.21301·cs.LG·January 30, 2026

Achieving $\varepsilon^{-2}$ Dependence for Average-Reward Q-Learning with a New Contraction Principle

Zijun Chen, Zaiwei Chen, Nian Si, Shengbo Wang

PDF

Open Access

TL;DR

This paper introduces a new contraction principle for average-reward Q-learning, achieving optimal sample complexity of approximately epsilon^{-2} without strong assumptions, by transforming the MDP to ensure contraction under a novel seminorm.

Contribution

It develops a new contraction principle for average-reward Q-learning using a lazy transformation, enabling optimal convergence rates without strong assumptions.

Findings

01

Achieves (\u03b5^{-2}) sample complexity for average-reward Q-learning.

02

Introduces a new instance-dependent seminorm for analysis.

03

Establishes contraction of the Bellman operator after a lazy transformation.

Abstract

We present the convergence rates of synchronous and asynchronous Q-learning for average-reward Markov decision processes, where the absence of contraction poses a fundamental challenge. Existing non-asymptotic results overcome this challenge by either imposing strong assumptions to enforce seminorm contraction or relying on discounted or episodic Markov decision processes as successive approximations, which either require unknown parameters or result in suboptimal sample complexity. In this work, under a reachability assumption, we establish optimal $O (ε^{- 2})$ sample complexity guarantees (up to logarithmic factors) for a simple variant of synchronous and asynchronous Q-learning that samples from the lazified dynamics, where the system remains in the current state with some fixed probability. At the core of our analysis is the construction of an instance-dependent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Robot Manipulation and Learning