Uncertainty Quantification and Exploration for Reinforcement Learning
YI Zhu, Jing Dong, Henry Lam

TL;DR
This paper develops statistical tools for quantifying uncertainty in reinforcement learning, deriving asymptotic distributions of Q-values, and introduces a new exploration strategy that outperforms benchmarks in experiments.
Contribution
It provides the first explicit asymptotic variance formulas for Q-values in RL and proposes a novel exploration policy based on these statistical insights.
Findings
Derived closed-form asymptotic variances for Q-values.
Constructed valid confidence regions for RL quantities.
Proposed Q-OCBA exploration policy outperforms benchmarks.
Abstract
We investigate statistical uncertainty quantification for reinforcement learning (RL) and its implications in exploration policy. Despite ever-growing literature on RL applications, fundamental questions about inference and error quantification, such as large-sample behaviors, appear to remain quite open. In this paper, we fill in the literature gap by studying the central limit theorem behaviors of estimated Q-values and value functions under various RL settings. In particular, we explicitly identify closed-form expressions of the asymptotic variances, which allow us to efficiently construct asymptotically valid confidence regions for key RL quantities. Furthermore, we utilize these asymptotic expressions to design an effective exploration strategy, which we call Q-value-based Optimal Computing Budget Allocation (Q-OCBA). The policy relies on maximizing the relative discrepancies among…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProbabilistic and Robust Engineering Design · Formal Methods in Verification · Adversarial Robustness in Machine Learning
