Design of intentional backdoors in sequential models
Zhaoyuan Yang, Naresh Iyer, Johan Reimann, Nurali Virani

TL;DR
This paper introduces novel backdoor attack methods on sequential models like reinforcement learning agents using LSTM networks, demonstrating their effectiveness and discussing potential defenses.
Contribution
It extends backdoor attack techniques to sequential decision-making models, specifically targeting LSTM-based reinforcement learning agents, which was underexplored in prior research.
Findings
Effective backdoor attacks demonstrated on grid-world environments
Activation of trojan triggers and malicious policies explained
Challenges with network size and unintentional triggers identified
Abstract
Recent work has demonstrated robust mechanisms by which attacks can be orchestrated on machine learning models. In contrast to adversarial examples, backdoor or trojan attacks embed surgically modified samples with targeted labels in the model training process to cause the targeted model to learn to misclassify chosen samples in the presence of specific triggers, while keeping the model performance stable across other nominal samples. However, current published research on trojan attacks mainly focuses on classification problems, which ignores sequential dependency between inputs. In this paper, we propose methods to discreetly introduce and exploit novel backdoor attacks within a sequential decision-making agent, such as a reinforcement learning agent, by training multiple benign and malicious policies within a single long short-term memory (LSTM) network. We demonstrate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Explainable Artificial Intelligence (XAI)
