Truly Sparse Neural Networks at Scale
Selima Curci, Decebal Constantin Mocanu, Mykola Pechenizkiyi

TL;DR
This paper demonstrates that truly sparse neural networks can be trained at scale with novel methods, achieving state-of-the-art performance and enabling more efficient AI.
Contribution
The paper introduces a parallel training algorithm, a specialized activation function, and a neuron importance metric to effectively train fully sparse neural networks.
Findings
Achieved the largest neural network in terms of representational power.
State-of-the-art performance with truly sparse networks.
Enabled environmentally friendly AI through efficiency improvements.
Abstract
Recently, sparse training methods have started to be established as a de facto approach for training and inference efficiency in artificial neural networks. Yet, this efficiency is just in theory. In practice, everyone uses a binary mask to simulate sparsity since the typical deep learning software and hardware are optimized for dense matrix operations. In this paper, we take an orthogonal approach, and we show that we can train truly sparse neural networks to harvest their full potential. To achieve this goal, we introduce three novel contributions, specially designed for sparse neural networks: (1) a parallel training algorithm and its corresponding sparse implementation from scratch, (2) an activation function with non-trainable parameters to favour the gradient flow, and (3) a hidden neurons importance metric to eliminate redundancies. All in one, we are able to break the record and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Advanced Neural Network Applications · Machine Learning and ELM
