Attack Agnostic Statistical Method for Adversarial Detection
Sambuddha Saha, Aashish Kumar, Pratyush Sahay, George Jose, Srinivas, Kruthiventi, Harikrishna Muralidhara

TL;DR
This paper introduces a statistical method for detecting adversarial inputs in image classification by comparing feature distributions, demonstrating effectiveness across multiple datasets and attack types.
Contribution
A novel, attack-agnostic statistical approach for adversarial detection that utilizes feature distribution comparison with various statistical distances.
Findings
Effective detection on MNIST and CIFAR-10 datasets
Performance is robust across different attack methods and perturbation levels
Uses statistical distances like ED and MMD for detection
Abstract
Deep Learning based AI systems have shown great promise in various domains such as vision, audio, autonomous systems (vehicles, drones), etc. Recent research on neural networks has shown the susceptibility of deep networks to adversarial attacks - a technique of adding small perturbations to the inputs which can fool a deep network into misclassifying them. Developing defenses against such adversarial attacks is an active research area, with some approaches proposing robust models that are immune to such adversaries, while other techniques attempt to detect such adversarial inputs. In this paper, we present a novel statistical approach for adversarial detection in image classification. Our approach is based on constructing a per-class feature distribution and detecting adversaries based on comparison of features of a test image with the feature distribution of its class. For this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsTest
