Spatio-Temporal Pruning and Quantization for Low-latency Spiking Neural Networks
Sayeed Shafayet Chowdhury, Isha Garg, Kaushik Roy

TL;DR
This paper introduces spatial and temporal pruning combined with quantization for spiking neural networks, significantly reducing latency and energy consumption while maintaining high accuracy and robustness.
Contribution
It proposes a novel spatio-temporal pruning method for SNNs, achieving substantial model compression and latency reduction with minimal accuracy loss.
Findings
Achieved 10-14X model compression through spatial pruning.
Reduced inference latency by 3-30X with maintained accuracy.
Lowered energy consumption by 8-14X compared to standard deep networks.
Abstract
Spiking Neural Networks (SNNs) are a promising alternative to traditional deep learning methods since they perform event-driven information processing. However, a major drawback of SNNs is high inference latency. The efficiency of SNNs could be enhanced using compression methods such as pruning and quantization. Notably, SNNs, unlike their non-spiking counterparts, consist of a temporal dimension, the compression of which can lead to latency reduction. In this paper, we propose spatial and temporal pruning of SNNs. First, structured spatial pruning is performed by determining the layer-wise significant dimensions using principal component analysis of the average accumulated membrane potential of the neurons. This step leads to 10-14X model compression. Additionally, it enables inference with lower latency and decreases the spike count per inference. To further reduce latency, temporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPruning · Dense Connections · Softmax · Max Pooling · Dropout · Convolution
