Feature Learning in $L_{2}$-regularized DNNs: Attraction/Repulsion and Sparsity
Arthur Jacot, Eugene Golikov, Cl\'ement Hongler, Franck Gabriel

TL;DR
This paper analyzes the loss surface of L2-regularized deep neural networks, revealing how feature representations evolve through attraction and repulsion dynamics, and establishes bounds on the number of neurons needed for local minima.
Contribution
It introduces a reformulation of the loss in terms of layerwise activations and covariances, proving a tight bound on neurons for local minima in homogeneous DNNs.
Findings
Layerwise activations follow attraction/repulsion dynamics.
Local minima can be achieved with at most N(N+1) neurons per layer.
Numerical experiments show fewer neurons often suffice in practice.
Abstract
We study the loss surface of DNNs with regularization. We show that the loss in terms of the parameters can be reformulated into a loss in terms of the layerwise activations of the training set. This reformulation reveals the dynamics behind feature learning: each hidden representations are optimal w.r.t. to an attraction/repulsion problem and interpolate between the input and output representations, keeping as little information from the input as necessary to construct the activation of the next layer. For positively homogeneous non-linearities, the loss can be further reformulated in terms of the covariances of the hidden representations, which takes the form of a partially convex optimization over a convex cone. This second reformulation allows us to prove a sparsity result for homogeneous DNNs: any local minimum of the -regularized loss can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Machine Learning and ELM
