Detecting Rewards Deterioration in Episodic Reinforcement Learning

Ido Greenberg; Shie Mannor

arXiv:2010.11660·cs.LG·November 1, 2021·1 cites

Detecting Rewards Deterioration in Episodic Reinforcement Learning

Ido Greenberg, Shie Mannor

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a statistical test for detecting reward deterioration in episodic reinforcement learning, effective without environment models and applicable online, outperforming standard methods in various control scenarios.

Contribution

It proposes a novel multivariate mean-shift detection method tailored for episodic RL rewards, with an innovative bootstrap-based false alarm control mechanism for online application.

Findings

01

Test outperforms standard methods by orders of magnitude in detecting reward deterioration.

02

Method is applicable to any episodic signal, not relying on environment models.

03

Effective in online detection of performance drifts in RL agents.

Abstract

In many RL applications, once training ends, it is vital to detect any deterioration in the agent performance as soon as possible. Furthermore, it often has to be done without modifying the policy and under minimal assumptions regarding the environment. In this paper, we address this problem by focusing directly on the rewards and testing for degradation. We consider an episodic framework, where the rewards within each episode are not independent, nor identically-distributed, nor Markov. We present this problem as a multivariate mean-shift detection problem with possibly partial observations. We define the mean-shift in a way corresponding to deterioration of a temporal signal (such as the rewards), and derive a test for this problem with optimal statistical power. Empirically, on deteriorated rewards in control problems (generated using various environment modifications), the test is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ido90/Rewards-Deterioration-Detection
pytorchOfficial

Videos

Detecting Rewards Deterioration in Episodic Reinforcement Learning· slideslive

Taxonomy

TopicsData Stream Mining Techniques · Reinforcement Learning in Robotics · Advanced Bandit Algorithms Research