Relating Regularization and Generalization through the Intrinsic Dimension of Activations
Bradley C.A. Brown, Jordan Juravsky, Anthony L. Caterini, Gabriel, Loaiza-Ganem

TL;DR
This paper empirically links model regularization to improved generalization by analyzing the intrinsic dimension of activations, revealing how regularization affects internal representations and the dynamics of sudden generalization phenomena.
Contribution
It demonstrates that regularization reduces the intrinsic dimension of activations, correlating with better generalization, and provides insights into the dynamics of grokking and feature extraction in neural networks.
Findings
Regularization decreases last-layer intrinsic dimension (LLID).
Excessive regularization hampers feature extraction in early layers.
Grokking correlates with a sudden drop in LLID after training saturation.
Abstract
Given a pair of models with similar training set performance, it is natural to assume that the model that possesses simpler internal representations would exhibit better generalization. In this work, we provide empirical evidence for this intuition through an analysis of the intrinsic dimension (ID) of model activations, which can be thought of as the minimal number of factors of variation in the model's representation of the data. First, we show that common regularization techniques uniformly decrease the last-layer ID (LLID) of validation set activations for image classification models and show how this strongly affects generalization performance. We also investigate how excessive regularization decreases a model's ability to extract features from data in earlier layers, leading to a negative effect on validation accuracy even while LLID continues to decrease and training accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Gaussian Processes and Bayesian Inference
