Relating Regularization and Generalization through the Intrinsic   Dimension of Activations

Bradley C.A. Brown; Jordan Juravsky; Anthony L. Caterini; Gabriel; Loaiza-Ganem

arXiv:2211.13239·cs.LG·November 28, 2022

Relating Regularization and Generalization through the Intrinsic Dimension of Activations

Bradley C.A. Brown, Jordan Juravsky, Anthony L. Caterini, Gabriel, Loaiza-Ganem

PDF

Open Access

TL;DR

This paper empirically links model regularization to improved generalization by analyzing the intrinsic dimension of activations, revealing how regularization affects internal representations and the dynamics of sudden generalization phenomena.

Contribution

It demonstrates that regularization reduces the intrinsic dimension of activations, correlating with better generalization, and provides insights into the dynamics of grokking and feature extraction in neural networks.

Findings

01

Regularization decreases last-layer intrinsic dimension (LLID).

02

Excessive regularization hampers feature extraction in early layers.

03

Grokking correlates with a sudden drop in LLID after training saturation.

Abstract

Given a pair of models with similar training set performance, it is natural to assume that the model that possesses simpler internal representations would exhibit better generalization. In this work, we provide empirical evidence for this intuition through an analysis of the intrinsic dimension (ID) of model activations, which can be thought of as the minimal number of factors of variation in the model's representation of the data. First, we show that common regularization techniques uniformly decrease the last-layer ID (LLID) of validation set activations for image classification models and show how this strongly affects generalization performance. We also investigate how excessive regularization decreases a model's ability to extract features from data in earlier layers, leading to a negative effect on validation accuracy even while LLID continues to decrease and training accuracy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Gaussian Processes and Bayesian Inference