Detecting Adversarial Data via Provable Adversarial Noise Amplification
Furkan Mumcu, Yasin Yilmaz

TL;DR
This paper provides a formal mathematical analysis of adversarial noise amplification in neural networks, proposing a novel detection method that enhances robustness against adversarial attacks.
Contribution
It introduces a formal theorem on adversarial noise amplification, a new training method with spectral loss, and a lightweight detection mechanism for adversarial inputs.
Findings
Theorem guarantees conditions for noise amplification.
Proposed detector effectively identifies adversarial inputs.
Method outperforms existing defenses against adaptive attacks.
Abstract
The nonuniform and growing impact of adversarial noise across the layers of deep neural networks has been used in the literature, without a formal mathematical justification, to detect adversarial inputs and improve robustness. In this work, we study this phenomenon in detail and present a formal adversarial noise amplification theorem. We specify a set of sufficient conditions under which the adversarial noise amplification is mathematically guaranteed. Based on theoretical observations, we propose a novel training methodology with a custom spectral loss function and a specific architectural design to enhance the amplification signal for detecting adversarial data. Finally, we introduce a new, lightweight detection mechanism that leverages the enhanced amplification signal and operates entirely at inference time. To validate our approach, we demonstrate the detector's efficacy against…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
