With Greater Distance Comes Worse Performance: On the Perspective of Layer Utilization and Model Generalization
James Wang, Cheng-Lin Yang

TL;DR
This paper investigates how different neural network layers contribute to generalization, revealing that early layers learn more general features, while deeper layers tend to overfit, and proposes weight re-initialization as a regularization method.
Contribution
It provides empirical evidence on layer-specific contributions to generalization and introduces weight re-initialization of final layers as a novel regularization technique.
Findings
Early layers learn features relevant to both training and testing data.
Deeper layers mainly minimize training risk and overfit.
Distance of weights to initial values correlates with generalization errors.
Abstract
Generalization of deep neural networks remains one of the main open problems in machine learning. Previous theoretical works focused on deriving tight bounds of model complexity, while empirical works revealed that neural networks exhibit double descent with respect to both training sample counts and the neural network size. In this paper, we empirically examined how different layers of neural networks contribute differently to the model; we found that early layers generally learn representations relevant to performance on both training data and testing data. Contrarily, deeper layers only minimize training risks and fail to generalize well with testing or mislabeled data. We further illustrate the distance of trained weights to its initial value of final layers has high correlation to generalization errors and can serve as an indicator of an overfit of model. Moreover, we show evidence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and Data Classification · Machine Learning and Algorithms
