Adaptive Adversarial Logits Pairing
Shangxi Wu, Jitao Sang, Kaiyuan Xu, Guanhua Zheng and, Changsheng Xu

TL;DR
This paper introduces Adaptive Adversarial Logits Pairing (AALP), a novel training method that improves adversarial robustness by adaptively focusing on key features and balancing training objectives, outperforming previous approaches.
Contribution
AALP modifies adversarial training with adaptive feature selection and sample weighting, enhancing robustness against adversarial attacks.
Findings
AALP achieves superior defense performance on multiple datasets.
Adaptive modules improve focus on high-contribution features.
Balanced training reduces overemphasis on logits pairing loss.
Abstract
Adversarial examples provide an opportunity as well as impose a challenge for understanding image classification systems. Based on the analysis of the adversarial training solution Adversarial Logits Pairing (ALP), we observed in this work that: (1) The inference of adversarially robust model tends to rely on fewer high-contribution features compared with vulnerable ones. (2) The training target of ALP doesn't fit well to a noticeable part of samples, where the logits pairing loss is overemphasized and obstructs minimizing the classification loss. Motivated by these observations, we design an Adaptive Adversarial Logits Pairing (AALP) solution by modifying the training process and training target of ALP. Specifically, AALP consists of an adaptive feature optimization module with Guided Dropout to systematically pursue fewer high-contribution features, and an adaptive sample weighting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDropout
