TrojDRL: Trojan Attacks on Deep Reinforcement Learning Agents

Panagiota Kiourti; Kacper Wardega; Susmit Jha; Wenchao Li

arXiv:1903.06638·cs.CR·March 18, 2019·24 cites

TrojDRL: Trojan Attacks on Deep Reinforcement Learning Agents

Panagiota Kiourti, Kacper Wardega, Susmit Jha, Wenchao Li

PDF

Open Access 2 Repos

TL;DR

This paper demonstrates that deep reinforcement learning agents are vulnerable to Trojan attacks through minimal data poisoning and reward modification, leading to hidden malicious behaviors that are hard to detect.

Contribution

It introduces Trojan attacks specific to DRL agents, showing their effectiveness and highlighting the ineffectiveness of existing defenses in this context.

Findings

01

Trojan attacks can be implemented with as little as 0.025% data poisoning.

02

Attacks cause drastic policy deterioration when triggered.

03

Existing Trojan defenses are ineffective against DRL Trojan attacks.

Abstract

Recent work has identified that classification models implemented as neural networks are vulnerable to data-poisoning and Trojan attacks at training time. In this work, we show that these training-time vulnerabilities extend to deep reinforcement learning (DRL) agents and can be exploited by an adversary with access to the training process. In particular, we focus on Trojan attacks that augment the function of reinforcement learning policies with hidden behaviors. We demonstrate that such attacks can be implemented through minuscule data poisoning (as little as 0.025% of the training data) and in-band reward modification that does not affect the reward on normal inputs. The policies learned with our proposed attack approach perform imperceptibly similar to benign policies but deteriorate drastically when the Trojan is triggered in both targeted and untargeted settings. Furthermore, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Security and Verification in Computing