Mechanism of feature learning in deep fully connected networks and   kernel machines that recursively learn features

Adityanarayanan Radhakrishnan; Daniel Beaglehole; Parthe Pandit,; Mikhail Belkin

arXiv:2212.13881·cs.LG·May 11, 2023

Mechanism of feature learning in deep fully connected networks and kernel machines that recursively learn features

Adityanarayanan Radhakrishnan, Daniel Beaglehole, Parthe Pandit,, Mikhail Belkin

PDF

Open Access 2 Repos

TL;DR

This paper uncovers the mechanism behind feature learning in deep neural networks, proposing a new theoretical framework that explains how features are selected and enabling a novel, backpropagation-free feature learning method applicable to various models.

Contribution

It introduces the Deep Neural Feature Ansatz, a new theory explaining feature learning, and develops Recursive Feature Machines that enhance kernel methods with state-of-the-art performance.

Findings

01

Deep neural networks learn features via the average gradient outer product.

02

The proposed mechanism explains phenomena like spurious features and the lottery ticket hypothesis.

03

Recursive Feature Machines outperform existing models on tabular data.

Abstract

In recent years neural networks have achieved impressive results on many technological and scientific tasks. Yet, the mechanism through which these models automatically select features, or patterns in data, for prediction remains unclear. Identifying such a mechanism is key to advancing performance and interpretability of neural networks and promoting reliable adoption of these models in scientific applications. In this paper, we identify and characterize the mechanism through which deep fully connected neural networks learn features. We posit the Deep Neural Feature Ansatz, which states that neural feature learning occurs by implementing the average gradient outer product to up-weight features strongly related to model output. Our ansatz sheds light on various deep learning phenomena including emergence of spurious features and simplicity biases and how pruning networks can increase…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Explainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning

MethodsPruning