Detecting Spiky Corruption in Markov Decision Processes

Jason Mancuso; Tomasz Kisielewski; David Lindner; Alok Singh

arXiv:1907.00452·cs.LG·July 2, 2019·1 cites

Detecting Spiky Corruption in Markov Decision Processes

Jason Mancuso, Tomasz Kisielewski, David Lindner, Alok Singh

PDF

Open Access 1 Repo

TL;DR

This paper addresses the challenge of reward corruption in reinforcement learning by characterizing spiky corruptions in CRMDPs, proposing an algorithm to detect corrupt states, and demonstrating its effectiveness in simple environments.

Contribution

It introduces a formal framework for spiky reward corruption in CRMDPs, provides regret bounds, and develops an algorithm to detect corrupt states to enable optimal policy learning.

Findings

01

The environment is solvable under sufficiently spiky reward corruption.

02

The proposed algorithm can detect corrupt states effectively.

03

The algorithm enables learning optimal policies despite reward corruption.

Abstract

Current reinforcement learning methods fail if the reward function is imperfect, i.e. if the agent observes reward different from what it actually receives. We study this problem within the formalism of Corrupt Reward Markov Decision Processes (CRMDPs). We show that if the reward corruption in a CRMDP is sufficiently "spiky", the environment is solvable. We fully characterize the regret bound of a Spiky CRMDP, and introduce an algorithm that is able to detect its corrupt states. We show that this algorithm can be used to learn the optimal policy with any common reinforcement learning algorithm. Finally, we investigate our algorithm in a pair of simple gridworld environments, finding that our algorithm can detect the corrupt states and learn the optimal policy despite the corruption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jvmancuso/safe-grid-agents
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Advanced Bandit Algorithms Research