Optimal Brain Apoptosis
Mingyuan Sun, Zheng Fang, Jiaxu Wang, Junjie Jiang, Delei Kong,, Chenming Hu, Yuetong Fang, Renjing Xu

TL;DR
Optimal Brain Apoptosis (OBA) introduces a precise second-order pruning method for CNNs and Transformers by directly computing Hessian-vector products, improving efficiency and performance in neural network compression.
Contribution
OBA advances parameter importance estimation by directly calculating Hessian-vector products, enabling more accurate and efficient pruning compared to previous approximation-based methods.
Findings
OBA achieves higher pruning accuracy with less performance loss.
The method is validated on multiple architectures and datasets.
Code availability facilitates reproducibility and further research.
Abstract
The increasing complexity and parameter count of Convolutional Neural Networks (CNNs) and Transformers pose challenges in terms of computational efficiency and resource demands. Pruning has been identified as an effective strategy to address these challenges by removing redundant elements such as neurons, channels, or connections, thereby enhancing computational efficiency without heavily compromising performance. This paper builds on the foundational work of Optimal Brain Damage (OBD) by advancing the methodology of parameter importance estimation using the Hessian matrix. Unlike previous approaches that rely on approximations, we introduce Optimal Brain Apoptosis (OBA), a novel pruning method that calculates the Hessian-vector product value directly for each parameter. By decomposing the Hessian matrix across network layers and identifying conditions under which inter-layer Hessian…
Peer Reviews
Decision·ICLR 2025 Poster
- Unlike previous methods that approximate the Hessian matrix, OBA calculates the full Hessian-vector product, providing a more accurate measure of parameter importance and leading to more precise pruning. - OBA supports both structured pruning (removing entire neurons, channels, or layers) and unstructured pruning (removing individual weights). It's compatible with a wide range of architectures. - The approach optimizes the calculation of the Hessian-vector product (reduced computational comple
- It seems like calculating the full Hessian-vector product, even with optimizations, can still be computationally expensive for larger and more complex networks. - Extending this method to newer architectures seems to require additional work. -The method was tested on a specific set of architectures, and its generalizability across a wider range of tasks or domains is yet to be fully established. - The experiments make the paper feel somewhat outdated, as the evaluations and datasets used are
An interesting new method for computing the Hessian of the loss function in feed-forward networks in a tractable way.
The improvements in accuracy and speed do not seem overwhelming.
1. Pruning performance looks fairly promising. The method achieves consistent improvement on various datasets, using the most commonly-used backbones (ResNet and ViT), and on unstructured vs structured pruning. 2. The results shown are quite comprehensive: pruning performance (accuracy, parameter reduction, FLOPs reduction, throughput increase), pruning cost (training and pruning time). Surprisingly, the pruning cost was not as high as I originally expected knowing that the method involves compu
1. Since pruning is not my area of expertise, I am unsure whether the authors used fair baselines for comparison. The proposed method is mainly compared against 7 methods from 3 papers, respectively in 2016, 2017 and 2019. A simple literature search gave me a few methods that claim to have achieved better pruning performances and are fairly well cited and fairly highly stared: https://arxiv.org/abs/2203.04248, https://arxiv.org/abs/2208.11580, https://arxiv.org/abs/2210.04092, https://arxiv.org/
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnesthesia and Neurotoxicity Research
MethodsPruning
