Random Sparse Lifts: Construction, Analysis and Convergence of finite   sparse networks

David A. R. Robin (DI-ENS); Kevin Scaman (DI-ENS); Marc Lelarge; (DI-ENS)

arXiv:2501.05930·math.OC·January 13, 2025·ICLR

Random Sparse Lifts: Construction, Analysis and Convergence of finite sparse networks

David A. R. Robin (DI-ENS), Kevin Scaman (DI-ENS), Marc Lelarge, (DI-ENS)

PDF

Open Access

TL;DR

This paper introduces a new class of neural networks called Random Sparse Lifts, which can be constructed and analyzed to provably reach low loss with training, without relying on overparameterization.

Contribution

It defines a framework for large sparse neural networks that guarantees convergence to low loss, extending the theory of deep learning without overparameterization assumptions.

Findings

01

Networks in this class include common deep learning architectures with sparsified weights.

02

The convergence to low loss is proven using algebraic topology and random graph theory.

03

The framework applies to networks constructed from simple computation graphs with added sparsity.

Abstract

We present a framework to define a large class of neural networks for which, by construction, training by gradient flow provably reaches arbitrarily low loss when the number of parameters grows. Distinct from the fixed-space global optimality of non-convex optimization, this new form of convergence, and the techniques introduced to prove such convergence, pave the way for a usable deep learning convergence theory in the near future, without overparameterization assumptions relating the number of parameters and training samples. We define these architectures from a simple computation graph and a mechanism to lift it, thus increasing the number of parameters, generalizing the idea of increasing the widths of multi-layer perceptrons. We show that architectures similar to most common deep learning models are present in this class, obtained by sparsifying the weight tensors of usual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Neural Networks and Applications