Robustness Feature Adapter for Efficient Adversarial Training

Quanwei Wu; Jun Guo; Wei Wang; Yi Wang

arXiv:2508.17680·cs.LG·August 26, 2025

Robustness Feature Adapter for Efficient Adversarial Training

Quanwei Wu, Jun Guo, Wei Wang, Yi Wang

PDF

TL;DR

This paper introduces a new adapter-based method for adversarial training that enhances robustness, reduces computational costs, and mitigates overfitting, applicable to large models and various architectures.

Contribution

It proposes a novel adapter-based approach for efficient adversarial training directly in feature space, addressing robustness, overfitting, and scalability issues.

Findings

01

Improves convergence quality during adversarial training.

02

Reduces computational overhead significantly.

03

Enhances robustness to unseen attacks.

Abstract

Adversarial training (AT) with projected gradient descent is the most popular method to improve model robustness under adversarial attacks. However, computational overheads become prohibitively large when AT is applied to large backbone models. AT is also known to have the issue of robust overfitting. This paper contributes to solving both problems simultaneously towards building more trustworthy foundation models. In particular, we propose a new adapter-based approach for efficient AT directly in the feature space. We show that the proposed adapter-based approach can improve the inner-loop convergence quality by eliminating robust overfitting. As a result, it significantly increases computational efficiency and improves model accuracy by generalizing adversarial robustness to unseen attacks. We demonstrate the effectiveness of the new adapter-based approach in different backbone…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.