Learning to Detect Adversarial Examples Based on Class Scores
Tobias Uelwer, Felix Michels, Oliver De Candido

TL;DR
This paper introduces a simple yet effective method for detecting adversarial examples in deep neural networks by training an SVM on class scores, demonstrating improved detection across various attacks and models.
Contribution
The paper proposes a novel approach using SVMs on class scores for adversarial detection, showing improved performance and adaptability over existing methods.
Findings
Effective detection of various adversarial attacks
Improved detection rate compared to existing methods
Applicable to multiple deep classification models
Abstract
Given the increasing threat of adversarial attacks on deep neural networks (DNNs), research on efficient detection methods is more important than ever. In this work, we take a closer look at adversarial attack detection based on the class scores of an already trained classification model. We propose to train a support vector machine (SVM) on the class scores to detect adversarial examples. Our method is able to detect adversarial examples generated by various attacks, and can be easily adopted to a plethora of deep classification models. We show that our approach yields an improved detection rate compared to an existing method, whilst being easy to implement. We perform an extensive empirical analysis on different deep classification models, investigating various state-of-the-art adversarial attacks. Moreover, we observe that our proposed method is better at detecting a combination of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
