A Theoretical Explanation of Activation Sparsity through Flat Minima and Adversarial Robustness
Ze Peng, Lei Qi, Yinghuan Shi, Yang Gao

TL;DR
This paper provides a theoretical explanation for activation sparsity in deep neural networks, linking it to flat minima and adversarial robustness, and demonstrates practical sparsity improvements with new modules.
Contribution
It introduces gradient sparsity as a source of activation sparsity and offers a theoretical framework connecting sparsity to flat minima and robustness in deep models.
Findings
Activation sparsity correlates with flat minima in well-trained models.
Proposed modules achieve 50% sparsity improvements on ImageNet-1k and C4.
Spectral concentration of weight matrices supports the sparsity explanation.
Abstract
A recent empirical observation (Li et al., 2022b) of activation sparsity in MLP blocks offers an opportunity to drastically reduce computation costs for free. Although having attributed it to training dynamics, existing theoretical explanations of activation sparsity are restricted to shallow networks, small training steps and special training, despite its emergence in deep models standardly trained for a large number of steps. To fill these gaps, we propose the notion of gradient sparsity as one source of activation sparsity and a theoretical explanation based on it that sees sparsity a necessary step to adversarial robustness w.r.t. hidden features and parameters, which is approximately the flatness of minima for well-learned models. The theory applies to standardly trained LayerNorm-ed MLPs, and further to Transformers or other architectures trained with weight noises. Eliminating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Sparse and Compressive Sensing Techniques
