Learning Neural Networks with Sparse Activations

Pranjal Awasthi; Nishanth Dikkala; Pritish Kamath; Raghu Meka

arXiv:2406.17989·cs.LG·June 27, 2024

Learning Neural Networks with Sparse Activations

Pranjal Awasthi, Nishanth Dikkala, Pritish Kamath, Raghu Meka

PDF

Open Access

TL;DR

This paper investigates the theoretical properties of neural networks with sparse activations, showing they have advantages over dense networks and aiming to inspire practical methods to exploit this sparsity.

Contribution

It provides the first formal PAC learnability analysis of sparsely activated MLP layers, demonstrating their computational and statistical benefits.

Findings

01

Sparsely activated networks are provably more efficient than dense ones.

02

Activation sparsity leads to better generalization bounds.

03

Theoretical results suggest practical advantages of exploiting sparsity.

Abstract

A core component present in many successful neural network architectures, is an MLP block of two fully connected layers with a non-linear activation in between. An intriguing phenomenon observed empirically, including in transformer architectures, is that, after training, the activations in the hidden layer of this MLP block tend to be extremely sparse on any given input. Unlike traditional forms of sparsity, where there are neurons/weights which can be deleted from the network, this form of {\em dynamic} activation sparsity appears to be harder to exploit to get more efficient networks. Motivated by this we initiate a formal study of PAC learnability of MLP layers that exhibit activation sparsity. We present a variety of results showing that such classes of functions do lead to provable computational and statistical advantages over their non-sparse counterparts. Our hope is that a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications