TL;DR
This paper introduces margin consistency as a way to efficiently detect vulnerable, non-robust samples in deep classifiers by linking input space margins with logit margins, enabling real-time vulnerability assessment.
Contribution
It establishes the theoretical link between input space and logit margins, and demonstrates how to use this for efficient detection of brittle decisions in robust models.
Findings
High correlation between input space margins and logit margins in robust models
Logit margin can effectively detect non-robust, brittle decisions
Pseudo-margin learning improves detection when margin consistency is low
Abstract
Despite extensive research on adversarial training strategies to improve robustness, the decisions of even the most robust deep learning models can still be quite sensitive to imperceptible perturbations, creating serious risks when deploying them for high-stakes real-world applications. While detecting such cases may be critical, evaluating a model's vulnerability at a per-instance level using adversarial attacks is computationally too intensive and unsuitable for real-time deployment scenarios. The input space margin is the exact score to detect non-robust samples and is intractable for deep neural networks. This paper introduces the concept of margin consistency -- a property that links the input space margins and the logit margins in robust models -- for efficient detection of vulnerable samples. First, we establish that margin consistency is a necessary and sufficient condition to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
MethodsSparse Evolutionary Training
