Asymptotic Analysis of Sample-averaged Q-learning
Saunak Kumar Panda, Ruiqi Liu, Yisha Xiang

TL;DR
This paper develops a theoretical framework for sample-averaged Q-learning, analyzing its asymptotic properties and proposing methods for confidence interval estimation, with empirical validation in standard RL environments.
Contribution
It introduces a generalized, asymptotic analysis of sample-averaged Q-learning and proposes a new interval estimation method without extra hyperparameters.
Findings
Asymptotic normality of sample-averaged Q-learning established.
Effective batch scheduling strategies improve learning efficiency.
Confidence intervals accurately reflect uncertainty in RL estimates.
Abstract
Reinforcement learning (RL) has emerged as a key approach for training agents in complex and uncertain environments. Incorporating statistical inference in RL algorithms is essential for understanding and managing uncertainty in model performance. This paper introduces a generalized framework for time-varying batch-averaged Q-learning, termed sample-averaged Q-learning (SA-QL), which extends traditional single-sample Q-learning by aggregating samples of rewards and next states to better account for data variability and uncertainty. We leverage the functional central limit theorem (FCLT) to establish a novel framework that provides insights into the asymptotic normality of the sample-averaged algorithm under mild conditions. Additionally, we develop a random scaling method for interval estimation, enabling the construction of confidence intervals without requiring extra hyperparameters.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems · Face and Expression Recognition · Neural Networks and Applications
MethodsRandom Scaling · Q-Learning
