Pruning is Optimal for Learning Sparse Features in High-Dimensions
Nuri Mert Vural, Murat A. Erdogdu

TL;DR
This paper provides a theoretical explanation for why pruning neural networks enhances feature learning in high-dimensional settings, demonstrating that pruned networks can optimally learn sparse models with better sample complexity.
Contribution
It proves that pruning neural networks aligned with the sparsity of the true model improves learning efficiency and establishes CSQ lower bounds showing the optimality of pruned networks in high dimensions.
Findings
Pruned networks achieve optimal sample complexity for sparse models.
Unpruned, basis-independent methods are suboptimal in high-sparsity regimes.
CSQ lower bounds confirm the optimality of pruning in certain settings.
Abstract
While it is commonly observed in practice that pruning networks to a certain level of sparsity can improve the quality of the features, a theoretical explanation of this phenomenon remains elusive. In this work, we investigate this by demonstrating that a broad class of statistical models can be optimally learned using pruned neural networks trained with gradient descent, in high-dimensions. We consider learning both single-index and multi-index models of the form , where is a degree- polynomial, and with , is the matrix containing relevant model directions. We assume that satisfies a certain -sparsity condition for matrices and show that pruning neural networks proportional to the sparsity level of improves their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Face and Expression Recognition · Neural Networks and Applications
MethodsPruning
