Revisiting adapters with adversarial training
Sylvestre-Alvise Rebuffi, Francesco Croce, Sven Gowal

TL;DR
This paper demonstrates that using adapters with adversarial training in Vision Transformers improves clean accuracy, enables model soups for trade-offs, and adapts well to distribution shifts, all with fewer parameters.
Contribution
It shows that adapters suffice for co-training on clean and adversarial inputs without batch statistic separation, improving accuracy and enabling flexible model combinations.
Findings
Improved top-1 accuracy on ImageNet by +1.12%.
Enabled model soups for clean and adversarial trade-offs.
Achieved +4.00% better accuracy on ImageNet variants.
Abstract
While adversarial training is generally used as a defense mechanism, recent works show that it can also act as a regularizer. By co-training a neural network on clean and adversarial inputs, it is possible to improve classification accuracy on the clean, non-adversarial inputs. We demonstrate that, contrary to previous findings, it is not necessary to separate batch statistics when co-training on clean and adversarial inputs, and that it is sufficient to use adapters with few domain-specific parameters for each type of input. We establish that using the classification token of a Vision Transformer (ViT) as an adapter is enough to match the classification performance of dual normalization layers, while using significantly less additional parameters. First, we improve upon the top-1 accuracy of a non-adversarially trained ViT-B16 model by +1.12% on ImageNet (reaching 83.76% top-1…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Advanced Neural Network Applications
MethodsModel Soups · Multi-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Dropout · Softmax · Label Smoothing · Adam · Adapter
