Identifying and Understanding Cross-Class Features in Adversarial Training
Zeming Wei, Yiwen Guo, Yisen Wang

TL;DR
This paper investigates the role of cross-class features in adversarial training, revealing how models initially learn shared features for robustness and later focus on class-specific features, offering new insights into AT mechanisms.
Contribution
It introduces the concept of cross-class features in AT, provides theoretical and empirical evidence of their impact, and offers a unified view of AT properties like soft-label training and robust overfitting.
Findings
Models learn more cross-class features early in AT.
Robust overfitting causes models to rely on class-specific features.
Insights refine understanding of AT mechanisms.
Abstract
Adversarial training (AT) has been considered one of the most effective methods for making deep neural networks robust against adversarial attacks, while the training mechanisms and dynamics of AT remain open research problems. In this paper, we present a novel perspective on studying AT through the lens of class-wise feature attribution. Specifically, we identify the impact of a key family of features on AT that are shared by multiple classes, which we call cross-class features. These features are typically useful for robust classification, which we offer theoretical evidence to illustrate through a synthetic data model. Through systematic studies across multiple model architectures and settings, we find that during the initial stage of AT, the model tends to learn more cross-class features until the best robustness checkpoint. As AT further squeezes the training robust loss and causes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning
