Adversarial Inception Backdoor Attacks against Reinforcement Learning

Ethan Rathbun; Alina Oprea; Christopher Amato

arXiv:2410.13995·cs.LG·June 4, 2025

Adversarial Inception Backdoor Attacks against Reinforcement Learning

Ethan Rathbun, Alina Oprea, Christopher Amato

PDF

Open Access

TL;DR

This paper introduces a novel backdoor attack method called inception attacks against Deep Reinforcement Learning, achieving high success rates under strict reward constraints by manipulating training data to induce adversarial behaviors.

Contribution

The work presents the first inception attack method that maintains high attack success under reward constraints, formally defines the attack, and demonstrates its effectiveness across multiple environments.

Findings

01

100% attack success rate in tested environments

02

Minimal impact on agent's original task performance

03

Effective under strict reward constraints

Abstract

Recent works have demonstrated the vulnerability of Deep Reinforcement Learning (DRL) algorithms against training-time, backdoor poisoning attacks. The objectives of these attacks are twofold: induce pre-determined, adversarial behavior in the agent upon observing a fixed trigger during deployment while allowing the agent to solve its intended task during training. Prior attacks assume arbitrary control over the agent's rewards, inducing values far outside the environment's natural constraints. This results in brittle attacks that fail once the proper reward constraints are enforced. Thus, in this work we propose a new class of backdoor attacks against DRL which are the first to achieve state of the art performance under strict reward constraints. These "inception" attacks manipulate the agent's training data -- inserting the trigger into prior observations and replacing high return…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning