Identifying and Understanding Cross-Class Features in Adversarial Training

Zeming Wei; Yiwen Guo; Yisen Wang

arXiv:2506.05032·cs.LG·June 6, 2025

Identifying and Understanding Cross-Class Features in Adversarial Training

Zeming Wei, Yiwen Guo, Yisen Wang

PDF

Open Access

TL;DR

This paper investigates the role of cross-class features in adversarial training, revealing how models initially learn shared features for robustness and later focus on class-specific features, offering new insights into AT mechanisms.

Contribution

It introduces the concept of cross-class features in AT, provides theoretical and empirical evidence of their impact, and offers a unified view of AT properties like soft-label training and robust overfitting.

Findings

01

Models learn more cross-class features early in AT.

02

Robust overfitting causes models to rely on class-specific features.

03

Insights refine understanding of AT mechanisms.

Abstract

Adversarial training (AT) has been considered one of the most effective methods for making deep neural networks robust against adversarial attacks, while the training mechanisms and dynamics of AT remain open research problems. In this paper, we present a novel perspective on studying AT through the lens of class-wise feature attribution. Specifically, we identify the impact of a key family of features on AT that are shared by multiple classes, which we call cross-class features. These features are typically useful for robust classification, which we offer theoretical evidence to illustrate through a synthetic data model. Through systematic studies across multiple model architectures and settings, we find that during the initial stage of AT, the model tends to learn more cross-class features until the best robustness checkpoint. As AT further squeezes the training robust loss and causes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning