TL;DR
This paper introduces a Hessian-based method for layer-wise quantization of Spiking Neural Networks, reducing memory and energy consumption while maintaining high accuracy.
Contribution
It proposes a novel layer-wise Hessian trace analysis for optimal bit-precision allocation and a simplified neuron model for efficient training.
Findings
Layer-wise Hessian trace correlates with quantization sensitivity.
Achieved 0.2% accuracy drop with 58% network size reduction.
Enhanced energy efficiency and throughput on neuromorphic hardware.
Abstract
To achieve the low latency, high throughput, and energy efficiency benefits of Spiking Neural Networks (SNNs), reducing the memory and compute requirements when running on a neuromorphic hardware is an important step. Neuromorphic architecture allows massively parallel computation with variable and local bit-precisions. However, how different bit-precisions should be allocated to different layers or connections of the network is not trivial. In this work, we demonstrate how a layer-wise Hessian trace analysis can measure the sensitivity of the loss to any perturbation of the layer's weights, and this can be used to guide the allocation of a layer-specific bit-precision when quantizing an SNN. In addition, current gradient based methods of SNN training use a complex neuron model with multiple state variables, which is not ideal for compute and memory efficiency. To address this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
