The Iterative Optimal Brain Surgeon: Faster Sparse Recovery by Leveraging Second-Order Information
Diyuan Wu, Ionut-Vlad Modoranu, Mher Safaryan, Denis Kuznedelev, Dan, Alistarh

TL;DR
This paper introduces an iterative optimal brain surgeon method that leverages second-order curvature information to improve sparse recovery in deep neural networks, providing theoretical guarantees and demonstrating strong empirical results on large-scale models.
Contribution
It develops new sparse recovery algorithms inspired by OBS with theoretical guarantees and applies them effectively to prune large-scale Transformer models.
Findings
Improved convergence bounds for sparse recovery algorithms.
Enhanced pruning performance on large-scale vision and language models.
Theoretical analysis connecting OBS and sparse recovery methods.
Abstract
The rising footprint of machine learning has led to a focus on imposing \emph{model sparsity} as a means of reducing computational and memory costs. For deep neural networks (DNNs), the state-of-the-art accuracy-vs-sparsity is achieved by heuristics inspired by the classical Optimal Brain Surgeon (OBS) framework~\citep{lecun90brain, hassibi1992second, hassibi1993optimal}, which leverages loss curvature information to make better pruning decisions. Yet, these results still lack a solid theoretical understanding, and it is unclear whether they can be improved by leveraging connections to the wealth of work on sparse recovery algorithms. In this paper, we draw new connections between these two areas and present new sparse recovery algorithms inspired by the OBS framework that comes with theoretical guarantees under reasonable assumptions and have strong practical performance. Specifically,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAtomic and Subatomic Physics Research · Advanced MRI Techniques and Applications
MethodsPruning · Focus
