Detecting Spiky Corruption in Markov Decision Processes
Jason Mancuso, Tomasz Kisielewski, David Lindner, Alok Singh

TL;DR
This paper addresses the challenge of reward corruption in reinforcement learning by characterizing spiky corruptions in CRMDPs, proposing an algorithm to detect corrupt states, and demonstrating its effectiveness in simple environments.
Contribution
It introduces a formal framework for spiky reward corruption in CRMDPs, provides regret bounds, and develops an algorithm to detect corrupt states to enable optimal policy learning.
Findings
The environment is solvable under sufficiently spiky reward corruption.
The proposed algorithm can detect corrupt states effectively.
The algorithm enables learning optimal policies despite reward corruption.
Abstract
Current reinforcement learning methods fail if the reward function is imperfect, i.e. if the agent observes reward different from what it actually receives. We study this problem within the formalism of Corrupt Reward Markov Decision Processes (CRMDPs). We show that if the reward corruption in a CRMDP is sufficiently "spiky", the environment is solvable. We fully characterize the regret bound of a Spiky CRMDP, and introduce an algorithm that is able to detect its corrupt states. We show that this algorithm can be used to learn the optimal policy with any common reinforcement learning algorithm. Finally, we investigate our algorithm in a pair of simple gridworld environments, finding that our algorithm can detect the corrupt states and learn the optimal policy despite the corruption.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Advanced Bandit Algorithms Research
