Parameter-free Optimal Rates for Nonlinear Semi-Norm Contractions with Applications to $Q$-Learning
Ankur Naskar, Gugan Thoppe, Vijay Gupta

TL;DR
This paper establishes the first parameter-free optimal convergence rates for nonlinear semi-norm contraction algorithms like $Q$-learning, applicable across various settings including average-reward and discounted cases, with broad practical relevance.
Contribution
It introduces a novel analysis framework that achieves parameter-free $ ilde{O}(1/ oot{t})$ rates for $Q$-learning, overcoming non-monotonicity challenges of semi-norms.
Findings
First parameter-free optimal rates for $Q$-learning established.
Applicable to both average-reward and discounted settings.
Works for synchronous, asynchronous, and distributed algorithms.
Abstract
Algorithms for solving \textit{nonlinear} fixed-point equations -- such as average-reward \textit{-learning} and \textit{TD-learning} -- often involve semi-norm contractions. Achieving parameter-free optimal convergence rates for these methods via Polyak--Ruppert averaging has remained elusive, largely due to the non-monotonicity of such semi-norms. We close this gap by (i.) recasting the averaged error as a linear recursion involving a nonlinear perturbation, and (ii.) taming the nonlinearity by coupling the semi-norm's contraction with the monotonicity of a suitably induced norm. Our main result yields the first parameter-free optimal rates for -learning in both average-reward and exponentially discounted settings, where denotes the iteration index. The result applies within a broad framework that accommodates synchronous and asynchronous updates,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMachine Learning and ELM · Domain Adaptation and Few-Shot Learning · Sparse and Compressive Sensing Techniques
