TrojDRL: Trojan Attacks on Deep Reinforcement Learning Agents
Panagiota Kiourti, Kacper Wardega, Susmit Jha, Wenchao Li

TL;DR
This paper demonstrates that deep reinforcement learning agents are vulnerable to Trojan attacks through minimal data poisoning and reward modification, leading to hidden malicious behaviors that are hard to detect.
Contribution
It introduces Trojan attacks specific to DRL agents, showing their effectiveness and highlighting the ineffectiveness of existing defenses in this context.
Findings
Trojan attacks can be implemented with as little as 0.025% data poisoning.
Attacks cause drastic policy deterioration when triggered.
Existing Trojan defenses are ineffective against DRL Trojan attacks.
Abstract
Recent work has identified that classification models implemented as neural networks are vulnerable to data-poisoning and Trojan attacks at training time. In this work, we show that these training-time vulnerabilities extend to deep reinforcement learning (DRL) agents and can be exploited by an adversary with access to the training process. In particular, we focus on Trojan attacks that augment the function of reinforcement learning policies with hidden behaviors. We demonstrate that such attacks can be implemented through minuscule data poisoning (as little as 0.025% of the training data) and in-band reward modification that does not affect the reward on normal inputs. The policies learned with our proposed attack approach perform imperceptibly similar to benign policies but deteriorate drastically when the Trojan is triggered in both targeted and untargeted settings. Furthermore, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Security and Verification in Computing
