Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility
Melih Barsbey, Lucas Prieto, Stefanos Zafeiriou, Tolga Birdal

TL;DR
This paper shows that using large learning rates in training machine learning models enhances robustness to spurious correlations and improves model compressibility by promoting invariant features, class separation, and activation sparsity.
Contribution
It reveals that high learning rates simultaneously improve robustness and compressibility, and uncovers the mechanisms behind this effect related to confident mispredictions of bias-conflicting samples.
Findings
Large learning rates improve robustness to spurious correlations.
High learning rates promote invariant feature utilization and sparsity.
The effect is consistent across datasets, models, and optimizers.
Abstract
Robustness and resource-efficiency are two highly desirable properties for modern machine learning models. However, achieving them jointly remains a challenge. In this paper, we identify high learning rates as a facilitator for simultaneously achieving robustness to spurious correlations and network compressibility. We demonstrate that large learning rates also produce desirable representation properties such as invariant feature utilization, class separation, and activation sparsity. Our findings indicate that large learning rates compare favorably to other hyperparameters and regularization methods, in consistently satisfying these properties in tandem. In addition to demonstrating the positive effect of large learning rates across diverse spurious correlation datasets, models, and optimizers, we also present strong evidence that the previously documented success of large learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
