Training Ensembles to Detect Adversarial Examples

Alexander Bagnall; Razvan Bunescu; Gordon Stewart

arXiv:1712.04006·cs.LG·December 13, 2017·30 cites

Training Ensembles to Detect Adversarial Examples

Alexander Bagnall, Razvan Bunescu, Gordon Stewart

PDF

Open Access 1 Repo

TL;DR

This paper introduces an ensemble approach that trains multiple models to detect adversarial examples by reducing agreement on out-of-distribution inputs, effectively identifying attacks like DeepFool and C&W on MNIST and CIFAR-10.

Contribution

The paper presents a novel ensemble training method that improves adversarial detection by balancing low error on benign data and disagreement on adversarial inputs.

Findings

01

Effective detection of adversarial examples across multiple attack types

02

High accuracy on MNIST and CIFAR-10 datasets

03

Robust against both white-box and black-box adversaries

Abstract

We propose a new ensemble method for detecting and classifying adversarial examples generated by state-of-the-art attacks, including DeepFool and C&W. Our method works by training the members of an ensemble to have low classification error on random benign examples while simultaneously minimizing agreement on examples outside the training distribution. We evaluate on both MNIST and CIFAR-10, against oblivious and both white- and black-box adversaries.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bagnalla/ensemble_detect_adv
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Bacillus and Francisella bacterial research