A Theory of I/O-Efficient Sparse Neural Network Inference
Niels Gleinig, Tal Ben-Nun, Torsten Hoefler

TL;DR
This paper provides a theoretical framework for analyzing and optimizing the I/O complexity of sparse neural network inference, leading to significant speedups on real hardware.
Contribution
It establishes bounds on I/O operations for sparse neural networks and introduces algorithms to approach optimal I/O efficiency, including instance-specific sparsity considerations.
Findings
Theoretical bounds on I/O complexity are within a factor of 2.
Algorithms achieve near-optimal I/O performance.
Empirical speedups of up to 45x on real hardware.
Abstract
As the accuracy of machine learning models increases at a fast rate, so does their demand for energy and compute resources. On a low level, the major part of these resources is consumed by data movement between different memory units. Modern hardware architectures contain a form of fast memory (e.g., cache, registers), which is small, and a slow memory (e.g., DRAM), which is larger but expensive to access. We can only process data that is stored in fast memory, which incurs data movement (input/output-operations, or I/Os) between the two units. In this paper, we provide a rigorous theoretical analysis of the I/Os needed in sparse feedforward neural network (FFNN) inference. We establish bounds that determine the optimal number of I/Os up to a factor of 2 and present a method that uses a number of I/Os within that range. Much of the I/O-complexity is determined by a few high-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Neural Networks and Applications
MethodsTest
