Online Poisoning Attack Against Reinforcement Learning under Black-box Environments
Jianhui Li, Bokang Zhang, Junfeng Wu

TL;DR
This paper introduces an online poisoning attack method targeting reinforcement learning in black-box environments, manipulating reward functions and state transitions to mislead the agent, validated through maze environment experiments.
Contribution
It presents a novel black-box poisoning attack algorithm for reinforcement learning, addressing unknown environment dynamics and using sample-based gradient estimation.
Findings
Effective poisoning demonstrated in maze environment
Algorithm successfully manipulates reward and transition data
Addresses challenges of unknown environment dynamics
Abstract
This paper proposes an online environment poisoning algorithm tailored for reinforcement learning agents operating in a black-box setting, where an adversary deliberately manipulates training data to lead the agent toward a mischievous policy. In contrast to prior studies that primarily investigate white-box settings, we focus on a scenario characterized by \textit{unknown} environment dynamics to the attacker and a \textit{flexible} reinforcement learning algorithm employed by the targeted agent. We first propose an attack scheme that is capable of poisoning the reward functions and state transitions. The poisoning task is formalized as a constrained optimization problem, following the framework of \cite{ma2019policy}. Given the transition probabilities are unknown to the attacker in a black-box environment, we apply a stochastic gradient descent algorithm, where the exact gradients…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
MethodsFocus
