Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility

Melih Barsbey; Lucas Prieto; Stefanos Zafeiriou; Tolga Birdal

arXiv:2507.17748·cs.LG·August 6, 2025

Large Learning Rates Simultaneously Achieve Robustness to Spurious Correlations and Compressibility

Melih Barsbey, Lucas Prieto, Stefanos Zafeiriou, Tolga Birdal

PDF

TL;DR

This paper shows that using large learning rates in training machine learning models enhances robustness to spurious correlations and improves model compressibility by promoting invariant features, class separation, and activation sparsity.

Contribution

It reveals that high learning rates simultaneously improve robustness and compressibility, and uncovers the mechanisms behind this effect related to confident mispredictions of bias-conflicting samples.

Findings

01

Large learning rates improve robustness to spurious correlations.

02

High learning rates promote invariant feature utilization and sparsity.

03

The effect is consistent across datasets, models, and optimizers.

Abstract

Robustness and resource-efficiency are two highly desirable properties for modern machine learning models. However, achieving them jointly remains a challenge. In this paper, we identify high learning rates as a facilitator for simultaneously achieving robustness to spurious correlations and network compressibility. We demonstrate that large learning rates also produce desirable representation properties such as invariant feature utilization, class separation, and activation sparsity. Our findings indicate that large learning rates compare favorably to other hyperparameters and regularization methods, in consistently satisfying these properties in tandem. In addition to demonstrating the positive effect of large learning rates across diverse spurious correlation datasets, models, and optimizers, we also present strong evidence that the previously documented success of large learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.