Confidence-Calibrated Adversarial Training: Generalizing to Unseen   Attacks

David Stutz; Matthias Hein; Bernt Schiele

arXiv:1910.06259·cs.LG·July 1, 2020·35 cites

Confidence-Calibrated Adversarial Training: Generalizing to Unseen Attacks

David Stutz, Matthias Hein, Bernt Schiele

PDF

Open Access 3 Repos 1 Video

TL;DR

This paper introduces confidence-calibrated adversarial training (CCAT), which improves robustness against unseen attack types by biasing models towards low confidence predictions and allowing rejection of uncertain examples.

Contribution

The authors propose CCAT, a novel adversarial training method that enhances generalization to unseen attacks by incorporating confidence calibration and rejection mechanisms.

Findings

01

CCAT improves robustness against various unseen attack norms.

02

CCAT achieves better clean accuracy compared to standard adversarial training.

03

Developed new attack methods to evaluate confidence-based defenses.

Abstract

Adversarial training yields robust models against a specific threat model, e.g., $L_{\infty}$ adversarial examples. Typically robustness does not generalize to previously unseen threat models, e.g., other $L_{p}$ norms, or larger perturbations. Our confidence-calibrated adversarial training (CCAT) tackles this problem by biasing the model towards low confidence predictions on adversarial examples. By allowing to reject examples with low confidence, robustness generalizes beyond the threat model employed during training. CCAT, trained only on $L_{\infty}$ adversarial examples, increases robustness against larger $L_{\infty}$ , $L_{2}$ , $L_{1}$ and $L_{0}$ attacks, adversarial frames, distal adversarial examples and corrupted examples and yields better clean accuracy compared to adversarial training. For thorough evaluation we developed novel white- and black-box attacks directly attacking CCAT by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Confidence-Calibrated Adversarial Training: Generalizing to Unseen Attacks· slideslive

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Anomaly Detection Techniques and Applications

MethodsTest