Calibrated Adversarial Training
Tianjin Huang, Vlado Menkovski, Yulong Pei, Mykola Pechenizkiy

TL;DR
This paper introduces Calibrated Adversarial Training, a novel method that improves model robustness by reducing semantic perturbation effects through pixel-level adaptations and theoretical calibration, demonstrating superior empirical performance.
Contribution
It proposes a new calibrated robust error metric and a training method that mitigates semantic distortions in adversarial examples, enhancing robustness.
Findings
Outperforms existing adversarial training methods on multiple datasets.
Provides theoretical bounds for the calibrated robust error.
Shows improved model robustness against adversarial attacks.
Abstract
Adversarial training is an approach of increasing the robustness of models to adversarial attacks by including adversarial examples in the training set. One major challenge of producing adversarial examples is to contain sufficient perturbation in the example to flip the model's output while not making severe changes in the example's semantical content. Exuberant change in the semantical content could also change the true label of the example. Adding such examples to the training set results in adverse effects. In this paper, we present the Calibrated Adversarial Training, a method that reduces the adverse effects of semantic perturbations in adversarial training. The method produces pixel-level adaptations to the perturbations based on novel calibrated robust error. We provide theoretical analysis on the calibrated robust error and derive an upper bound for it. Our empirical results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications
MethodsFLIP
