Illusory Attacks: Information-Theoretic Detectability Matters in Adversarial Attacks
Tim Franzmeyer, Stephen McAleer, Jo\~ao F. Henriques, Jakob N., Foerster, Philip H.S. Torr, Adel Bibi, Christian Schroeder de Witt

TL;DR
This paper introduces -illusory, a new adversarial attack on reinforcement learning agents that is both effective and statistically harder to detect, highlighting the need for improved anomaly detection methods.
Contribution
The paper presents -illusory, a novel attack method with bounded detectability, and a dual ascent algorithm to learn such attacks end-to-end, advancing adversarial attack strategies.
Findings
-illusory is significantly harder to detect automatically.
Humans find -illusory attacks more difficult to detect.
The attack demonstrates effectiveness while maintaining statistical detectability bounds.
Abstract
Autonomous agents deployed in the real world need to be robust against adversarial attacks on sensory inputs. Robustifying agent policies requires anticipating the strongest attacks possible. We demonstrate that existing observation-space attacks on reinforcement learning agents have a common weakness: while effective, their lack of information-theoretic detectability constraints makes them detectable using automated means or human inspection. Detectability is undesirable to adversaries as it may trigger security escalations. We introduce {\epsilon}-illusory, a novel form of adversarial attack on sequential decision-makers that is both effective and of {\epsilon}-bounded statistical detectability. We propose a novel dual ascent algorithm to learn such attacks end-to-end. Compared to existing attacks, we empirically find {\epsilon}-illusory to be significantly harder to detect with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
