Compression and Inference of Spiking Neural Networks on Resource-Constrained Hardware
Karol C. Jurzec, Tomasz Szydlo, Maciej Wielgosz

TL;DR
This paper introduces a lightweight, optimized C runtime for spiking neural networks that enables efficient inference on resource-constrained hardware, achieving significant speedups and memory savings while maintaining accuracy.
Contribution
It presents a novel C-based runtime and model compression techniques for SNNs, facilitating deployment on embedded devices like microcontrollers.
Findings
Achieves ~10x speedup on desktop CPU
Reduces memory usage enabling microcontroller deployment
Maintains functional parity with Python baseline
Abstract
Spiking neural networks (SNNs) communicate via discrete spikes in time rather than continuous activations. Their event-driven nature offers advantages for temporal processing and energy efficiency on resource-constrained hardware, but training and deployment remain challenging. We present a lightweight C-based runtime for SNN inference on edge devices and optimizations that reduce latency and memory without sacrificing accuracy. Trained models exported from SNNTorch are translated to a compact C representation; static, cache-friendly data layouts and preallocation avoid interpreter and allocation overheads. We further exploit sparse spiking activity to prune inactive neurons and synapses, shrinking computation in upstream convolutional layers. Experiments on N-MNIST and ST-MNIST show functional parity with the Python baseline while achieving ~10 speedups on desktop CPU and additional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices · Neural Networks and Reservoir Computing
