Resource Efficient Neural Networks Using Hessian Based Pruning
Jack Chong, Manas Gupta, Lihui Chen

TL;DR
This paper introduces a faster, memory-efficient method for neural network pruning using Hessian trace estimation with FP16 precision, enabling quicker and more scalable model compression without accuracy loss.
Contribution
The paper proposes EHAP, a modified Hessian trace estimation method using FP16, significantly reducing computation time and memory usage in neural network pruning.
Findings
Speed ups of 17% to 44% in Hessian trace computation.
40% reduction in GPU memory usage during pruning.
No noticeable accuracy difference between FP16 and FP32 Hessian calculations.
Abstract
Neural network pruning is a practical way for reducing the size of trained models and the number of floating-point operations. One way of pruning is to use the relative Hessian trace to calculate sensitivity of each channel, as compared to the more common magnitude pruning approach. However, the stochastic approach used to estimate the Hessian trace needs to iterate over many times before it can converge. This can be time-consuming when used for larger models with many millions of parameters. To address this problem, we modify the existing approach by estimating the Hessian trace using FP16 precision instead of FP32. We test the modified approach (EHAP) on ResNet-32/ResNet-56/WideResNet-28-8 trained on CIFAR10/CIFAR100 image classification tasks and achieve faster computation of the Hessian trace. Specifically, our modified approach can achieve speed ups ranging from 17% to as much as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Machine Learning and Data Classification
MethodsPruning · Attentive Walk-Aggregating Graph Neural Network · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
