On the Unreasonable Effectiveness of Last-layer Retraining

John C. Hill; Tyler LaBonte; Xinchen Zhang; Vidya Muthukumar

arXiv:2512.01766·cs.LG·May 15, 2026

On the Unreasonable Effectiveness of Last-layer Retraining

John C. Hill, Tyler LaBonte, Xinchen Zhang, Vidya Muthukumar

PDF

TL;DR

Last-layer retraining (LLR) improves neural network robustness and minority group performance, primarily due to better group balance in the held-out set rather than neural collapse mitigation.

Contribution

The paper challenges the neural collapse hypothesis and provides evidence that group balance in the held-out set explains LLR's effectiveness, highlighting implicit group-balancing algorithms.

Findings

01

LLR improves worst-group accuracy even with imbalanced held-out sets

02

Neural collapse does not explain LLR's effectiveness

03

Implicit group balancing in algorithms like CB-LLR and AFR enhances robustness

Abstract

Last-layer retraining (LLR) methods -- wherein the last layer of a neural network is reinitialized and retrained on a held-out set following ERM training -- have garnered interest as an efficient approach to rectify dependence on spurious correlations and improve performance on minority groups. Surprisingly, LLR has been found to improve worst-group accuracy even when the held-out set is an imbalanced subset of the training set. We initially hypothesize that this ``unreasonable effectiveness'' of LLR is explained by its ability to mitigate neural collapse through the held-out set, resulting in the implicit bias of gradient descent benefiting robustness. Our empirical investigation does not support this hypothesis. Instead, we present strong evidence for an alternative hypothesis: that the success of LLR is primarily due to better group balance in the held-out set. We conclude by showing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.