Online Estimation and Inference for Robust Policy Evaluation in   Reinforcement Learning

Weidong Liu; Jiyuan Tu; Xi Chen; Yichen Zhang

arXiv:2310.02581·stat.ML·March 4, 2025

Online Estimation and Inference for Robust Policy Evaluation in Reinforcement Learning

Weidong Liu, Jiyuan Tu, Xi Chen, Yichen Zhang

PDF

Open Access

TL;DR

This paper introduces a fully online robust policy evaluation method for reinforcement learning that handles outliers and heavy-tailed rewards, providing reliable statistical inference and validated through simulations and real-world tests.

Contribution

It develops a novel online robust policy evaluation framework with Bahadur-type representation and inference procedures, addressing outliers and heavy tails in reinforcement learning.

Findings

01

Effective handling of outliers and heavy-tailed rewards.

02

Reliable online statistical inference for policy evaluation.

03

Validated through simulations and real-world experiments.

Abstract

Reinforcement learning has emerged as one of the prominent topics attracting attention in modern statistical learning, with policy evaluation being a key component. Unlike the traditional machine learning literature on this topic, our work emphasizes statistical inference for the model parameters and value functions of reinforcement learning algorithms. While most existing analyses assume random rewards to follow standard distributions, we embrace the concept of robust statistics in reinforcement learning by simultaneously addressing issues of outlier contamination and heavy-tailed rewards within a unified framework. In this paper, we develop a fully online robust policy evaluation procedure, and establish the Bahadur-type representation of our estimator. Furthermore, we develop an online procedure to efficiently conduct statistical inference based on the asymptotic distribution. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed Sensor Networks and Detection Algorithms · Adversarial Robustness in Machine Learning · Reinforcement Learning in Robotics