A Statistical Analysis of Polyak-Ruppert Averaged Q-learning

Xiang Li; Wenhao Yang; Jiadong Liang; Zhihua Zhang; Michael I. Jordan

arXiv:2112.14582·stat.ML·February 21, 2023

A Statistical Analysis of Polyak-Ruppert Averaged Q-learning

Xiang Li, Wenhao Yang, Jiadong Liang, Zhihua Zhang, Michael I. Jordan

PDF

Open Access 1 Repo

TL;DR

This paper provides a detailed statistical analysis of Polyak-Ruppert averaged Q-learning, establishing a functional central limit theorem, online inference methods, and optimal error bounds in tabular Markov decision processes.

Contribution

It introduces a functional CLT for averaged Q-learning, shows its asymptotic efficiency as an RAL estimator, and derives nonasymptotic error bounds, extending to entropy-regularized Q-learning.

Findings

01

Functional CLT for averaged Q-learning process

02

Online inference method based on the CLT

03

Instance-dependent lower bounds for error

Abstract

We study Q-learning with Polyak-Ruppert averaging in a discounted Markov decision process in synchronous and tabular settings. Under a Lipschitz condition, we establish a functional central limit theorem for the averaged iteration $\overset{ˉ}{Q}_{T}$ and show that its standardized partial-sum process converges weakly to a rescaled Brownian motion. The functional central limit theorem implies a fully online inference method for reinforcement learning. Furthermore, we show that $\overset{ˉ}{Q}_{T}$ is the regular asymptotically linear (RAL) estimator for the optimal Q-value function $Q^{*}$ that has the most efficient influence function. We present a nonasymptotic analysis for the $ℓ_{\infty}$ error, $E ∥ \overset{ˉ}{Q}_{T} - Q^{*} ∥_{\infty}$ , showing that it matches the instance-dependent lower bound for polynomial step sizes. Similar results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lx10077/AveQLearning
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsControl Systems and Identification · Reinforcement Learning in Robotics · Receptor Mechanisms and Signaling

MethodsQ-Learning