Execute Order 66: Targeted Data Poisoning for Reinforcement Learning

Harrison Foley; Liam Fowl; Tom Goldstein; Gavin Taylor

arXiv:2201.00762·cs.LG·July 29, 2022·1 cites

Execute Order 66: Targeted Data Poisoning for Reinforcement Learning

Harrison Foley, Liam Fowl, Tom Goldstein, Gavin Taylor

PDF

Open Access

TL;DR

This paper presents a novel data poisoning attack on reinforcement learning that causes misbehavior only at specific target states by minimally altering training data, without controlling policy or rewards.

Contribution

It introduces a targeted poisoning method for reinforcement learning using gradient alignment, effective without policy or reward control, demonstrated on Atari games.

Findings

01

Successful targeted misbehavior in Atari games

02

Minimal training data modifications required

03

Effective across different game difficulties

Abstract

Data poisoning for reinforcement learning has historically focused on general performance degradation, and targeted attacks have been successful via perturbations that involve control of the victim's policy and rewards. We introduce an insidious poisoning attack for reinforcement learning which causes agent misbehavior only at specific target states - all while minimally modifying a small fraction of training observations without assuming any control over policy or reward. We accomplish this by adapting a recent technique, gradient alignment, to reinforcement learning. We test our method and demonstrate success in two Atari games of varying difficulty.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics