What Scalable Second-Order Information Knows for Pruning at Initialization
Ivo Gollini Navarrete, Nicol\'as Mauricio Cuadrado \'Avila, Martin Tak\'a\v{c}, Samuel Horv\'ath

TL;DR
This paper demonstrates that scalable second-order approximations like Empirical Fisher and Hutchinson diagonals effectively identify critical parameters at initialization, improving pruning performance across various models and datasets with linear complexity.
Contribution
It introduces and empirically validates the use of scalable second-order approximations for pruning at initialization, offering a practical and effective alternative to traditional methods.
Findings
Hutchinson-based criteria outperform or match existing methods across models and datasets.
Updating batch normalization statistics improves data-dependent pruning criteria.
Second-order approximations balance efficiency and accuracy in pruning at initialization.
Abstract
Pruning remains an effective strategy for reducing both the costs and environmental impact associated with deploying large neural networks (NNs) while maintaining performance. Classical methods, such as OBD (LeCun et al., 1989) and OBS (Hassibi et al., 1992), demonstrate that utilizing curvature information can significantly enhance the balance between network complexity and performance. However, the computation and storage of the Hessian matrix make it impractical for modern NNs, motivating the use of approximations. Recent research (Gur et al., 2018; Karakida et al., 2019) suggests that the top eigenvalues guide optimization in a small subspace, are identifiable early, and remain consistent during training. Motivated by these findings, we revisit pruning at initialization (PaI) to evaluate scalable, unbiased second-order approximations, such as the Empirical Fisher and Hutchinson…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Neural Networks and Reservoir Computing · Domain Adaptation and Few-Shot Learning
MethodsPruning
