Over-parameterised Shallow Neural Networks with Asymmetrical Node   Scaling: Global Convergence Guarantees and Feature Learning

Francois Caron; Fadhel Ayed; Paul Jung; Hoil Lee; Juho Lee; Hongseok; Yang

arXiv:2302.01002·stat.ML·February 19, 2025·1 cites

Over-parameterised Shallow Neural Networks with Asymmetrical Node Scaling: Global Convergence Guarantees and Feature Learning

Francois Caron, Fadhel Ayed, Paul Jung, Hoil Lee, Juho Lee, Hongseok, Yang

PDF

Open Access 1 Repo

TL;DR

This paper studies wide, shallow neural networks with asymmetrical node scaling, proving they can learn features and converge globally, unlike traditional NTK models, with practical benefits for pruning and transfer learning.

Contribution

It introduces a novel asymmetrical node scaling approach for shallow neural networks, providing theoretical guarantees of convergence and feature learning.

Findings

01

Gradient flow and descent converge to global minima in large networks.

02

Networks with asymmetrical scaling can learn features, unlike NTK models.

03

Experimental results support theoretical claims and highlight benefits for pruning and transfer learning.

Abstract

We consider gradient-based optimisation of wide, shallow neural networks, where the output of each hidden node is scaled by a positive parameter. The scaling parameters are non-identical, differing from the classical Neural Tangent Kernel (NTK) parameterisation. We prove that for large such neural networks, with high probability, gradient flow and gradient descent converge to a global minimum and can learn features in some sense, unlike in the NTK parameterisation. We perform experiments illustrating our theoretical results and discuss the benefits of such scaling in terms of prunability and transfer learning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

anomdoubleblind/asymmetrical_scaling
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Stochastic Gradient Optimization Techniques

MethodsPruning · Neural Tangent Kernel