Improving Robustness with Adaptive Weight Decay
Amin Ghiasi, Ali Shafahi, Reza Ardekani

TL;DR
This paper introduces adaptive weight decay, which dynamically adjusts the regularization hyper-parameter during training, leading to significant improvements in adversarial robustness and reduced overfitting across various datasets and models.
Contribution
The authors propose a novel adaptive weight decay method that tunes the hyper-parameter on-the-fly based on gradient and weight norms, enhancing robustness and reducing overfitting.
Findings
20% relative robustness improvement on CIFAR-100
10% relative robustness improvement on CIFAR-10
Less sensitivity to learning rate and smaller weight norms
Abstract
We propose adaptive weight decay, which automatically tunes the hyper-parameter for weight decay during each training iteration. For classification problems, we propose changing the value of the weight decay hyper-parameter on the fly based on the strength of updates from the classification loss (i.e., gradient of cross-entropy), and the regularization loss (i.e., -norm of the weights). We show that this simple modification can result in large improvements in adversarial robustness -- an area which suffers from robust overfitting -- without requiring extra data across various datasets and architecture choices. For example, our reformulation results in relative robustness improvement for CIFAR-100, and relative robustness improvement on CIFAR-10 comparing to the best tuned hyper-parameters of traditional weight decay resulting in models that have comparable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Advanced Neural Network Applications
MethodsWeight Decay
