Online Statistical Inference of Constant Sample-averaged Q-Learning
Saunak Kumar Panda, Tong Li, Ruiqi Liu, Yisha Xiang

TL;DR
This paper introduces a framework for online statistical inference in sample-averaged Q-learning, enabling confidence interval construction and performance assessment in noisy reinforcement learning environments.
Contribution
It adapts the functional central limit theorem for Q-learning, providing a novel method for confidence interval estimation and inference in reinforcement learning.
Findings
The proposed method achieves accurate coverage rates in experiments.
Confidence intervals are effectively constructed for both toy and real-world problems.
The approach improves understanding of Q-learning stability and uncertainty.
Abstract
Reinforcement learning algorithms have been widely used for decision-making tasks in various domains. However, the performance of these algorithms can be impacted by high variance and instability, particularly in environments with noise or sparse rewards. In this paper, we propose a framework to perform statistical online inference for a sample-averaged Q-learning approach. We adapt the functional central limit theorem (FCLT) for the modified algorithm under some general conditions and then construct confidence intervals for the Q-values via random scaling. We conduct experiments to perform inference on both the modified approach and its traditional counterpart, Q-learning using random scaling and report their coverage rates and confidence interval widths on two problems: a grid world problem as a simple toy example and a dynamic resource-matching problem as a real-world example for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
