HOLMES: to Detect Adversarial Examples with Multiple Detectors
Jing Wen

TL;DR
HOLMES is a lightweight, hierarchical system that detects adversarial examples by analyzing DNN output logits, achieving high accuracy and low false positives across multiple attack types without modifying the original models.
Contribution
The paper introduces HOLMES, a novel external detection system using multiple logit-based detectors, enhancing robustness against unseen adversarial examples without altering DNNs.
Findings
HOLMES achieves higher detection accuracy than single detectors.
HOLMES maintains low false positive rates across various attacks.
HOLMES is compatible with all learning models and external APIs.
Abstract
Deep neural networks (DNNs) can easily be cheated by some imperceptible but purposeful noise added to images, and erroneously classify them. Previous defensive work mostly focused on retraining the models or detecting the noise, but has either shown limited success rates or been attacked by new adversarial examples. Instead of focusing on adversarial images or the interior of DNN models, we observed that adversarial examples generated by different algorithms can be identified based on the output of DNNs (logits). Logit can serve as an exterior feature to train detectors. Then, we propose HOLMES (Hierarchically Organized Light-weight Multiple dEtector System) to reinforce DNNs by detecting potential adversarial examples to minimize the threats they may bring in practical. HOLMES is able to distinguish \textit{unseen} adversarial examples from multiple attacks with high accuracy and low…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Anomaly Detection Techniques and Applications
