Feature Learning in $L_{2}$-regularized DNNs: Attraction/Repulsion and   Sparsity

Arthur Jacot; Eugene Golikov; Cl\'ement Hongler; Franck Gabriel

arXiv:2205.15809·stat.ML·October 17, 2022·1 cites

Feature Learning in $L_{2}$-regularized DNNs: Attraction/Repulsion and Sparsity

Arthur Jacot, Eugene Golikov, Cl\'ement Hongler, Franck Gabriel

PDF

Open Access 1 Video

TL;DR

This paper analyzes the loss surface of L2-regularized deep neural networks, revealing how feature representations evolve through attraction and repulsion dynamics, and establishes bounds on the number of neurons needed for local minima.

Contribution

It introduces a reformulation of the loss in terms of layerwise activations and covariances, proving a tight bound on neurons for local minima in homogeneous DNNs.

Findings

01

Layerwise activations follow attraction/repulsion dynamics.

02

Local minima can be achieved with at most N(N+1) neurons per layer.

03

Numerical experiments show fewer neurons often suffice in practice.

Abstract

We study the loss surface of DNNs with $L_{2}$ regularization. We show that the loss in terms of the parameters can be reformulated into a loss in terms of the layerwise activations $Z_{ℓ}$ of the training set. This reformulation reveals the dynamics behind feature learning: each hidden representations $Z_{ℓ}$ are optimal w.r.t. to an attraction/repulsion problem and interpolate between the input and output representations, keeping as little information from the input as necessary to construct the activation of the next layer. For positively homogeneous non-linearities, the loss can be further reformulated in terms of the covariances of the hidden representations, which takes the form of a partially convex optimization over a convex cone. This second reformulation allows us to prove a sparsity result for homogeneous DNNs: any local minimum of the $L_{2}$ -regularized loss can be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Feature Learning in $L_2$-regularized DNNs: Attraction/Repulsion and Sparsity· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Machine Learning and ELM