Online Estimation and Inference for Robust Policy Evaluation in Reinforcement Learning
Weidong Liu, Jiyuan Tu, Xi Chen, Yichen Zhang

TL;DR
This paper introduces a fully online robust policy evaluation method for reinforcement learning that handles outliers and heavy-tailed rewards, providing reliable statistical inference and validated through simulations and real-world tests.
Contribution
It develops a novel online robust policy evaluation framework with Bahadur-type representation and inference procedures, addressing outliers and heavy tails in reinforcement learning.
Findings
Effective handling of outliers and heavy-tailed rewards.
Reliable online statistical inference for policy evaluation.
Validated through simulations and real-world experiments.
Abstract
Reinforcement learning has emerged as one of the prominent topics attracting attention in modern statistical learning, with policy evaluation being a key component. Unlike the traditional machine learning literature on this topic, our work emphasizes statistical inference for the model parameters and value functions of reinforcement learning algorithms. While most existing analyses assume random rewards to follow standard distributions, we embrace the concept of robust statistics in reinforcement learning by simultaneously addressing issues of outlier contamination and heavy-tailed rewards within a unified framework. In this paper, we develop a fully online robust policy evaluation procedure, and establish the Bahadur-type representation of our estimator. Furthermore, we develop an online procedure to efficiently conduct statistical inference based on the asymptotic distribution. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed Sensor Networks and Detection Algorithms · Adversarial Robustness in Machine Learning · Reinforcement Learning in Robotics
