Learning sparse features can lead to overfitting in neural networks
Leonardo Petrini, Francesco Cagnetta, Eric Vanden-Eijnden, Matthieu, Wyart

TL;DR
This paper investigates how learning sparse features in neural networks can sometimes cause overfitting, especially when the target function is smooth or constant, contrasting with the benefits of feature learning in certain settings.
Contribution
It provides a theoretical explanation for when feature learning can harm generalization, highlighting the role of sparsity and its impact on smoothness and overfitting.
Findings
Sparse feature learning can worsen generalization in certain tasks.
Methods avoiding feature learning may outperform learned representations in some cases.
Empirical evidence shows learned features can lead to less smooth, overfitting predictors.
Abstract
It is widely believed that the success of deep networks lies in their ability to learn a meaningful representation of the features of the data. Yet, understanding when and how this feature learning improves performance remains a challenge: for example, it is beneficial for modern architectures trained to classify images, whereas it is detrimental for fully-connected networks trained for the same task on the same data. Here we propose an explanation for this puzzle, by showing that feature learning can perform worse than lazy training (via random feature kernel or the NTK) as the former can lead to a sparser neural representation. Although sparsity is known to be essential for learning anisotropic data, it is detrimental when the target function is constant or smooth along certain directions of input space. We illustrate this phenomenon in two settings: (i) regression of Gaussian random…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Statistical Methods and Inference · Medical Image Segmentation Techniques
