How Does Overparameterization Affect Features?
Ahmet Cagri Duzgun, Samy Jelassi, Yuanzhi Li

TL;DR
This paper investigates how overparameterization influences feature learning in neural networks, revealing that overparameterized models learn distinctive features and outperform underparameterized ones, supported by experiments on various architectures and datasets.
Contribution
It provides a comparative analysis of feature expressivity in over- and underparameterized networks, highlighting the unique features learned by overparameterized models and proposing a toy explanation.
Findings
Overparameterized networks learn features that cannot be spanned by concatenating underparameterized features.
Overparameterized models outperform underparameterized models even when many underparameterized models are combined.
Experimental validation on CIFAR-10 and MNLI datasets supports these conclusions.
Abstract
Overparameterization, the condition where models have more parameters than necessary to fit their training loss, is a crucial factor for the success of deep learning. However, the characteristics of the features learned by overparameterized networks are not well understood. In this work, we explore this question by comparing models with the same architecture but different widths. We first examine the expressivity of the features of these models, and show that the feature space of overparameterized networks cannot be spanned by concatenating many underparameterized features, and vice versa. This reveals that both overparameterized and underparameterized networks acquire some distinctive features. We then evaluate the performance of these models, and find that overparameterized networks outperform underparameterized networks, even when many of the latter are concatenated. We corroborate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Face and Expression Recognition · Machine Learning and Data Classification
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Softmax · Layer Normalization · VGG-16 · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Adam
