Backdoors in DRL: Four Environments Focusing on In-distribution Triggers

Chace Ashcraft; Ted Staley; Josh Carney; Cameron Hickert; Derek Juba; Kiran Karra; and Nathan Drenkow

arXiv:2505.17248·cs.LG·December 16, 2025

Backdoors in DRL: Four Environments Focusing on In-distribution Triggers

Chace Ashcraft, Ted Staley, Josh Carney, Cameron Hickert, Derek Juba, Kiran Karra, and Nathan Drenkow

PDF

TL;DR

This paper investigates backdoor attacks in deep reinforcement learning, demonstrating that in-distribution triggers pose a significant security threat across four diverse environments, despite being more challenging to implement.

Contribution

The authors develop and analyze in-distribution backdoor attacks in four DRL environments, highlighting their viability and challenges compared to out-of-distribution triggers.

Findings

01

In-distribution triggers are feasible and pose security risks in DRL.

02

Implementing in-distribution triggers requires additional effort but remains effective.

03

Backdoor attacks can be successful even with basic data poisoning methods.

Abstract

Backdoor attacks, or trojans, pose a security risk by concealing undesirable behavior in deep neural network models. Open-source neural networks are downloaded from the internet daily, possibly containing backdoors, and third-party model developers are common. To advance research on backdoor attack mitigation, we develop several trojans for deep reinforcement learning (DRL) agents. We focus on in-distribution triggers, which occur within the agent's natural data distribution, since they pose a more significant security threat than out-of-distribution triggers due to their ease of activation by the attacker during model deployment. We implement backdoor attacks in four reinforcement learning (RL) environments: LavaWorld, Randomized LavaWorld, Colorful Memory, and Modified Safety Gymnasium. We train various models, both clean and backdoored, to characterize these attacks. We find that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFocus