A Theoretical Explanation of Activation Sparsity through Flat Minima and   Adversarial Robustness

Ze Peng; Lei Qi; Yinghuan Shi; Yang Gao

arXiv:2309.03004·cs.LG·October 27, 2023

A Theoretical Explanation of Activation Sparsity through Flat Minima and Adversarial Robustness

Ze Peng, Lei Qi, Yinghuan Shi, Yang Gao

PDF

Open Access

TL;DR

This paper provides a theoretical explanation for activation sparsity in deep neural networks, linking it to flat minima and adversarial robustness, and demonstrates practical sparsity improvements with new modules.

Contribution

It introduces gradient sparsity as a source of activation sparsity and offers a theoretical framework connecting sparsity to flat minima and robustness in deep models.

Findings

01

Activation sparsity correlates with flat minima in well-trained models.

02

Proposed modules achieve 50% sparsity improvements on ImageNet-1k and C4.

03

Spectral concentration of weight matrices supports the sparsity explanation.

Abstract

A recent empirical observation (Li et al., 2022b) of activation sparsity in MLP blocks offers an opportunity to drastically reduce computation costs for free. Although having attributed it to training dynamics, existing theoretical explanations of activation sparsity are restricted to shallow networks, small training steps and special training, despite its emergence in deep models standardly trained for a large number of steps. To fill these gaps, we propose the notion of gradient sparsity as one source of activation sparsity and a theoretical explanation based on it that sees sparsity a necessary step to adversarial robustness w.r.t. hidden features and parameters, which is approximately the flatness of minima for well-learned models. The theory applies to standardly trained LayerNorm-ed MLPs, and further to Transformers or other architectures trained with weight noises. Eliminating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Sparse and Compressive Sensing Techniques