Illusory Attacks: Information-Theoretic Detectability Matters in   Adversarial Attacks

Tim Franzmeyer; Stephen McAleer; Jo\~ao F. Henriques; Jakob N.; Foerster; Philip H.S. Torr; Adel Bibi; Christian Schroeder de Witt

arXiv:2207.10170·cs.AI·May 7, 2024

Illusory Attacks: Information-Theoretic Detectability Matters in Adversarial Attacks

Tim Franzmeyer, Stephen McAleer, Jo\~ao F. Henriques, Jakob N., Foerster, Philip H.S. Torr, Adel Bibi, Christian Schroeder de Witt

PDF

Open Access

TL;DR

This paper introduces -illusory, a new adversarial attack on reinforcement learning agents that is both effective and statistically harder to detect, highlighting the need for improved anomaly detection methods.

Contribution

The paper presents -illusory, a novel attack method with bounded detectability, and a dual ascent algorithm to learn such attacks end-to-end, advancing adversarial attack strategies.

Findings

01

-illusory is significantly harder to detect automatically.

02

Humans find -illusory attacks more difficult to detect.

03

The attack demonstrates effectiveness while maintaining statistical detectability bounds.

Abstract

Autonomous agents deployed in the real world need to be robust against adversarial attacks on sensory inputs. Robustifying agent policies requires anticipating the strongest attacks possible. We demonstrate that existing observation-space attacks on reinforcement learning agents have a common weakness: while effective, their lack of information-theoretic detectability constraints makes them detectable using automated means or human inspection. Detectability is undesirable to adversaries as it may trigger security escalations. We introduce {\epsilon}-illusory, a novel form of adversarial attack on sequential decision-makers that is both effective and of {\epsilon}-bounded statistical detectability. We propose a novel dual ascent algorithm to learn such attacks end-to-end. Compared to existing attacks, we empirically find {\epsilon}-illusory to be significantly harder to detect with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning