Feature learning is decoupled from generalization in high capacity neural networks

Niclas Alexander G\"oring; Charles London; Abdurrahman Hadi Erturk; Chris Mingard; Yoonsoo Nam; Ard A. Louis

arXiv:2507.19680·cs.LG·July 29, 2025

Feature learning is decoupled from generalization in high capacity neural networks

Niclas Alexander G\"oring, Charles London, Abdurrahman Hadi Erturk, Chris Mingard, Yoonsoo Nam, Ard A. Louis

PDF

TL;DR

This paper distinguishes between feature learning and generalization in neural networks, showing that high capacity networks can learn features well but still not necessarily generalize better, challenging existing theories.

Contribution

It introduces the concept of feature quality and empirically demonstrates that current theories focus on feature strength rather than quality, highlighting a gap in understanding neural network generalization.

Findings

01

Neural networks outperform kernel methods on staircase functions.

02

Existing theories mainly assess feature learning strength, not quality.

03

Feature quality is crucial for understanding generalization in neural networks.

Abstract

Neural networks outperform kernel methods, sometimes by orders of magnitude, e.g. on staircase functions. This advantage stems from the ability of neural networks to learn features, adapting their hidden representations to better capture the data. We introduce a concept we call feature quality to measure this performance improvement. We examine existing theories of feature learning and demonstrate empirically that they primarily assess the strength of feature learning, rather than the quality of the learned features themselves. Consequently, current theories of feature learning do not provide a sufficient foundation for developing theories of neural network generalization.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.