Conflict-Aware Adversarial Training

Zhiyu Xue; Haohan Wang; Yao Qin; Ramtin Pedarsani

arXiv:2410.16579·cs.LG·October 23, 2024

Conflict-Aware Adversarial Training

Zhiyu Xue, Haohan Wang, Yao Qin, Ramtin Pedarsani

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Conflict-Aware Adversarial Training (CA-AT), a novel method that improves the balance between standard accuracy and adversarial robustness by addressing gradient conflicts in training.

Contribution

The paper proposes a conflict-aware factor to optimize the trade-off between standard and adversarial loss, outperforming traditional weighted-average methods.

Findings

01

CA-AT achieves better robustness and accuracy trade-offs.

02

The conflict-aware factor reduces gradient conflicts during training.

03

Experimental results validate the effectiveness of CA-AT in various settings.

Abstract

Adversarial training is the most effective method to obtain adversarial robustness for deep neural networks by directly involving adversarial samples in the training procedure. To obtain an accurate and robust model, the weighted-average method is applied to optimize standard loss and adversarial loss simultaneously. In this paper, we argue that the weighted-average method does not provide the best tradeoff for the standard performance and adversarial robustness. We argue that the failure of the weighted-average method is due to the conflict between the gradients derived from standard and adversarial loss, and further demonstrate such a conflict increases with attack budget theoretically and practically. To alleviate this problem, we propose a new trade-off paradigm for adversarial training with a conflict-aware factor for the convex combination of standard and adversarial loss, named…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 5Confidence 4

Strengths

1. The proposed Conflict-Aware Adversarial Training (CA-AT) effectively addresses gradient conflict, achieving a better balance between standard performance and adversarial robustness. 2. Comprehensive experimental results across various datasets and model architectures validate the effectiveness of CA-AT in improving the trade-off between standard and adversarial accuracy. 3. The method demonstrates strong performance in both training from scratch and parameter-efficient fine-tuning, showcasi

Weaknesses

1. The proposed CA-AT aims to manipulate the gradient of the clean examples and the adversarial example, the idea is not novel, either for input gradient alignment [1] or model gradient alignment [2]. 2. I think starting from PGD adversarial training, they just use the adversarial example for training, rather than the combination that CA-AT wants to tackle. 3. Although CA-AT shows improved performance over Vanilla AT in several experiments, it lacks comparisons with other advanced adversaria

Reviewer 02Rating 6Confidence 4

Strengths

- The paper is easy to follow. - Extensive experiments are conducted to demonstrate the effectiveness of the proposed method empirically.

Weaknesses

- The proposed factor seems a bit bizarre to me. Take Figure 1 as an example. Any $g_a$ that ends in the dotted line with $arccos(g_a, g_b) > \gamma$ will result in the same $g^*$, which doesn't make sense to me. Imagine $g_a$ equals $g_o$ in Figure 1, and one will still get the same $g^*$. - One should conduct an ablation study by using traditional $\lambda$-weighted mean of $g_a$ and $g_c$ when $\phi \leq \gamma$ and only $g_c$ when $\phi > \gamma$, as I suspect that this might be the reason f

Reviewer 03Rating 6Confidence 4

Strengths

1. The paper is well-organized, featuring a logical structure with clear illustrations, algorithms, experimental results, images, and tables, making it easy to understand and follow. 2. The notation and terminology throughout the paper are highly consistent and precise, enhancing clarity and readability. 3. The paper introduces the metric $\mu$ based on the weighted average method to measure the conflict and convergence between gradients, providing theoretical upper bounds. The logical flow is

Weaknesses

1. Please provide detailed accuracy results for experiments on ViT and Swin-T in table format. 2. Does CA-AT also perform well on larger datasets such as ImageNet? 3. Additional baselines that achieve similar levels of balance should be included, such as other advanced weighted methods or gradient operations. Using only Vanilla AT as a baseline is insufficient.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Guidance and Control Systems