Robust Alignment: Harmonizing Clean Accuracy and Adversarial Robustness in Adversarial Training
Yanyun Wang, Qingqing Ye, Li Liu, Zi Liang, Haibo Hu

TL;DR
This paper introduces Robust Alignment, a novel adversarial training approach that improves the balance between clean accuracy and adversarial robustness by aligning input and latent spaces.
Contribution
It proposes a new training target and two techniques—fixed perturbation intensity and DICAR—to harmonize accuracy and robustness in adversarial training.
Findings
RAAT outperforms four baselines and 14 SOTA methods in accuracy-robustness trade-off.
The method improves robustness on CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets.
Experimental results confirm the effectiveness of the proposed approach.
Abstract
Adversarial Training (AT) is one of the most effective methods for developing robust deep neural networks (DNNs). However, AT faces a trade-off problem between clean accuracy and adversarial robustness. In this work, we reveal a surprising phenomenon for the first time: Varying input perturbation intensities for training samples near decision boundaries in AT have minimal impact on model robustness. This finding directly exposes the inconsistency between accuracy and robustness score fluctuations, leading us to identify the misalignment between input and latent spaces as a critical driver of the robustness-accuracy trade-off. To mitigate this misalignment for harmonizing accuracy and robustness, we define Robust Alignment as a new AT target, encouraging the model perception to change with input perturbations provided the final label prediction remains unchanged, which can be achieved…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
