Learning to Detect Adversarial Examples Based on Class Scores

Tobias Uelwer; Felix Michels; Oliver De Candido

arXiv:2107.04435·cs.LG·July 12, 2021

Learning to Detect Adversarial Examples Based on Class Scores

Tobias Uelwer, Felix Michels, Oliver De Candido

PDF

TL;DR

This paper introduces a simple yet effective method for detecting adversarial examples in deep neural networks by training an SVM on class scores, demonstrating improved detection across various attacks and models.

Contribution

The paper proposes a novel approach using SVMs on class scores for adversarial detection, showing improved performance and adaptability over existing methods.

Findings

01

Effective detection of various adversarial attacks

02

Improved detection rate compared to existing methods

03

Applicable to multiple deep classification models

Abstract

Given the increasing threat of adversarial attacks on deep neural networks (DNNs), research on efficient detection methods is more important than ever. In this work, we take a closer look at adversarial attack detection based on the class scores of an already trained classification model. We propose to train a support vector machine (SVM) on the class scores to detect adversarial examples. Our method is able to detect adversarial examples generated by various attacks, and can be easily adopted to a plethora of deep classification models. We show that our approach yields an improved detection rate compared to an existing method, whilst being easy to implement. We perform an extensive empirical analysis on different deep classification models, investigating various state-of-the-art adversarial attacks. Moreover, we observe that our proposed method is better at detecting a combination of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.