Asymptotic Analysis of Sample-averaged Q-learning

Saunak Kumar Panda; Ruiqi Liu; Yisha Xiang

arXiv:2410.10737·cs.LG·February 28, 2025

Asymptotic Analysis of Sample-averaged Q-learning

Saunak Kumar Panda, Ruiqi Liu, Yisha Xiang

PDF

Open Access

TL;DR

This paper develops a theoretical framework for sample-averaged Q-learning, analyzing its asymptotic properties and proposing methods for confidence interval estimation, with empirical validation in standard RL environments.

Contribution

It introduces a generalized, asymptotic analysis of sample-averaged Q-learning and proposes a new interval estimation method without extra hyperparameters.

Findings

01

Asymptotic normality of sample-averaged Q-learning established.

02

Effective batch scheduling strategies improve learning efficiency.

03

Confidence intervals accurately reflect uncertainty in RL estimates.

Abstract

Reinforcement learning (RL) has emerged as a key approach for training agents in complex and uncertain environments. Incorporating statistical inference in RL algorithms is essential for understanding and managing uncertainty in model performance. This paper introduces a generalized framework for time-varying batch-averaged Q-learning, termed sample-averaged Q-learning (SA-QL), which extends traditional single-sample Q-learning by aggregating samples of rewards and next states to better account for data variability and uncertainty. We leverage the functional central limit theorem (FCLT) to establish a novel framework that provides insights into the asymptotic normality of the sample-averaged algorithm under mild conditions. Additionally, we develop a random scaling method for interval estimation, enabling the construction of confidence intervals without requiring extra hyperparameters.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems · Face and Expression Recognition · Neural Networks and Applications

MethodsRandom Scaling · Q-Learning