SleeperNets: Universal Backdoor Poisoning Attacks Against Reinforcement   Learning Agents

Ethan Rathbun; Christopher Amato; Alina Oprea

arXiv:2405.20539·cs.LG·October 22, 2024

SleeperNets: Universal Backdoor Poisoning Attacks Against Reinforcement Learning Agents

Ethan Rathbun, Christopher Amato, Alina Oprea

PDF

Open Access 1 Video

TL;DR

This paper introduces SleeperNets, a universal backdoor poisoning attack against reinforcement learning agents, demonstrating its effectiveness across multiple environments while maintaining agent performance.

Contribution

The work presents a novel attack framework with theoretical guarantees and develops SleeperNets, a universal backdoor method exploiting dynamic reward poisoning in RL.

Findings

01

Significant improvement in attack success rate over existing methods

02

Maintains benign episodic return during attacks

03

Effective across multiple diverse environments

Abstract

Reinforcement learning (RL) is an actively growing field that is seeing increased usage in real-world, safety-critical applications -- making it paramount to ensure the robustness of RL algorithms against adversarial attacks. In this work we explore a particularly stealthy form of training-time attacks against RL -- backdoor poisoning. Here the adversary intercepts the training of an RL agent with the goal of reliably inducing a particular action when the agent observes a pre-determined trigger at inference time. We uncover theoretical limitations of prior work by proving their inability to generalize across domains and MDPs. Motivated by this, we formulate a novel poisoning attack framework which interlinks the adversary's objectives with those of finding an optimal policy -- guaranteeing attack success in the limit. Using insights from our theoretical analysis we develop…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SleeperNets: Universal Backdoor Poisoning Attacks Against Reinforcement Learning Agents· slideslive

Taxonomy

TopicsAdvanced Malware Detection Techniques · Adversarial Robustness in Machine Learning · Network Security and Intrusion Detection