Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement Learning
Amin Rakhsha, Goran Radanovic, Rati Devidze, Xiaojin Zhu, Adish Singla

TL;DR
This paper investigates training-time adversarial attacks on reinforcement learning agents by poisoning the environment, demonstrating how attackers can stealthily manipulate the environment to force agents into specific target policies, revealing significant security vulnerabilities.
Contribution
The paper introduces an optimization framework for stealthy environment poisoning attacks on RL agents, providing theoretical bounds and demonstrating attack effectiveness in offline and online settings.
Findings
Attacker can successfully teach target policies under mild conditions.
Stealthy attacks can manipulate RL agents in both planning and learning scenarios.
Significant security threats identified for RL applications.
Abstract
We study a security threat to reinforcement learning where an attacker poisons the learning environment to force the agent into executing a target policy chosen by the attacker. As a victim, we consider RL agents whose objective is to find a policy that maximizes average reward in undiscounted infinite-horizon problem settings. The attacker can manipulate the rewards or the transition dynamics in the learning environment at training-time and is interested in doing so in a stealthy manner. We propose an optimization framework for finding an \emph{optimal stealthy attack} for different measures of attack cost. We provide sufficient technical conditions under which the attack is feasible and provide lower/upper bounds on the attack cost. We instantiate our attacks in two settings: (i) an \emph{offline} setting where the agent is doing planning in the poisoned environment, and (ii) an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics · Smart Grid Security and Resilience
