A General-Purpose Theorem for High-Probability Bounds of Stochastic Approximation with Polyak Averaging

Sajad Khodadadian; Martin Zubeldia

arXiv:2505.21796·stat.ML·May 29, 2025

A General-Purpose Theorem for High-Probability Bounds of Stochastic Approximation with Polyak Averaging

Sajad Khodadadian, Martin Zubeldia

PDF

Open Access 1 Video

TL;DR

This paper develops a general framework to establish high-probability bounds for stochastic approximation algorithms with Polyak-Ruppert averaging, providing sharp non-asymptotic concentration results applicable to various algorithms.

Contribution

It introduces a unified approach to derive non-asymptotic concentration bounds for averaged stochastic approximation iterates, extending analysis to complex algorithms like TD and Q-learning.

Findings

01

Derived tight concentration bounds for contractive SA algorithms

02

Extended bounds to temporal difference and Q-learning with averaging

03

Showed the bounds are nearly optimal through tightness examples

Abstract

Polyak-Ruppert averaging is a widely used technique to achieve the optimal asymptotic variance of stochastic approximation (SA) algorithms, yet its high-probability performance guarantees remain underexplored in general settings. In this paper, we present a general framework for establishing non-asymptotic concentration bounds for the error of averaged SA iterates. Our approach assumes access to individual concentration bounds for the unaveraged iterates and yields a sharp bound on the averaged iterates. We also construct an example, showing the tightness of our result up to constant multiplicative factors. As direct applications, we derive tight concentration bounds for contractive SA algorithms and for algorithms such as temporal difference learning and Q-learning with averaging, obtaining new bounds in settings where traditional analysis is challenging.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A General-Purpose Theorem for High-Probability Bounds of Stochastic Approximation with Polyak Averaging· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Simulation Techniques and Applications · Reinforcement Learning in Robotics

MethodsQ-Learning