DLA: Dense-Layer-Analysis for Adversarial Example Detection
Philip Sperl, Ching-Yu Kao, Peng Chen, Konstantin B\"ottinger

TL;DR
This paper introduces DLA, a real-time detection framework for adversarial examples in DNNs, analyzing dense layer activations with a secondary neural network to identify malicious inputs across multiple domains.
Contribution
The paper presents a novel end-to-end method that leverages dense layer activation patterns for effective adversarial example detection without impacting model performance.
Findings
Successfully detects adversarial examples in image, language, and audio domains.
Generalizes detection capability across different attack types and datasets.
Resilient against white-box adaptive attacks.
Abstract
In recent years Deep Neural Networks (DNNs) have achieved remarkable results and even showed super-human capabilities in a broad range of domains. This led people to trust in DNNs' classifications and resulting actions even in security-sensitive environments like autonomous driving. Despite their impressive achievements, DNNs are known to be vulnerable to adversarial examples. Such inputs contain small perturbations to intentionally fool the attacked model. In this paper, we present a novel end-to-end framework to detect such attacks during classification without influencing the target model's performance. Inspired by recent research in neuron-coverage guided testing we show that dense layers of DNNs carry security-sensitive information. With a secondary DNN we analyze the activation patterns of the dense layers during classification runtime, which enables effective and real-time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
