Online Bootstrap Inference For Policy Evaluation in Reinforcement   Learning

Pratik Ramprasad; Yuantong Li; Zhuoran Yang; Zhaoran Wang; Will Wei; Sun; Guang Cheng

arXiv:2108.03706·stat.ML·June 29, 2022

Online Bootstrap Inference For Policy Evaluation in Reinforcement Learning

Pratik Ramprasad, Yuantong Li, Zhuoran Yang, Zhaoran Wang, Will Wei, Sun, Guang Cheng

PDF

Open Access

TL;DR

This paper investigates the use of the online bootstrap method for statistical inference in reinforcement learning, specifically for policy evaluation using TD and GTD algorithms, demonstrating its consistency and effectiveness.

Contribution

It introduces the online bootstrap approach for RL policy evaluation, extending its applicability to Markov noise settings and providing theoretical and empirical validation.

Findings

01

The online bootstrap is distributionally consistent for RL policy evaluation.

02

The method performs well across various real RL environments.

03

It extends bootstrap inference to Markov noise scenarios in RL.

Abstract

The recent emergence of reinforcement learning has created a demand for robust statistical inference methods for the parameter estimates computed using these algorithms. Existing methods for statistical inference in online learning are restricted to settings involving independently sampled observations, while existing statistical inference methods in reinforcement learning (RL) are limited to the batch setting. The online bootstrap is a flexible and efficient approach for statistical inference in linear stochastic approximation algorithms, but its efficacy in settings involving Markov noise, such as RL, has yet to be explored. In this paper, we study the use of the online bootstrap method for statistical inference in RL. In particular, we focus on the temporal difference (TD) learning and Gradient TD (GTD) learning algorithms, which are themselves special instances of linear stochastic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Multi-Objective Optimization Algorithms · Distributed Sensor Networks and Detection Algorithms · Advanced Bandit Algorithms Research