Accelerating Linear Recurrent Neural Networks for the Edge with Unstructured Sparsity
Alessandro Pierro, Steven Abreu, Jonathan Timcheck, Philipp Stratmann, Andreas Wild, Sumit Bam Shrestha

TL;DR
This paper demonstrates that unstructured sparsity in linear RNNs significantly improves efficiency and performance for edge applications, achieving state-of-the-art results with reduced latency and energy consumption on neuromorphic hardware.
Contribution
It provides a comprehensive scaling study of sparse linear RNNs, showing their superior efficiency-performance trade-offs and practical deployment on neuromorphic chips for real-time edge tasks.
Findings
Sparse linear RNNs achieve 2x less compute and 36% less memory at same accuracy.
Models attain state-of-the-art results in audio denoising.
Deployment on neuromorphic hardware yields 42x lower latency and 149x lower energy use.
Abstract
Linear recurrent neural networks enable powerful long-range sequence modeling with constant memory usage and time-per-token during inference. These architectures hold promise for streaming applications at the edge, but deployment in resource-constrained environments requires hardware-aware optimizations to minimize latency and energy consumption. Unstructured sparsity offers a compelling solution, enabling substantial reductions in compute and memory requirements--when accelerated by compatible hardware platforms. In this paper, we conduct a scaling study to investigate the Pareto front of performance and efficiency across inference compute budgets. We find that highly sparse linear RNNs consistently achieve better efficiency-performance trade-offs than dense baselines, with 2x less compute and 36% less memory at iso-accuracy. Our models achieve state-of-the-art results on a real-time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
