The Impact of Activation Sparsity on Overfitting in Convolutional Neural Networks
Karim Huesmann, Luis Garcia Rodriguez, Lars Linsen, and Benjamin Risse

TL;DR
This paper investigates how activation sparsity relates to overfitting in CNNs, introducing new explainable AI measures and a differentiable sparsity penalty to improve understanding and control of overfitting.
Contribution
It introduces a perplexity-based sparsity measure, visualizes layer-wise activation patterns, and proposes a differentiable penalty to study and reduce overfitting in CNNs.
Findings
Activation sparsity increases before overfitting occurs.
Reduced sparsity improves generalization and classification performance.
Dense activations enable better feature learning without overfitting.
Abstract
Overfitting is one of the fundamental challenges when training convolutional neural networks and is usually identified by a diverging training and test loss. The underlying dynamics of how the flow of activations induce overfitting is however poorly understood. In this study we introduce a perplexity-based sparsity definition to derive and visualise layer-wise activation measures. These novel explainable AI strategies reveal a surprising relationship between activation sparsity and overfitting, namely an increase in sparsity in the feature extraction layers shortly before the test loss starts rising. This tendency is preserved across network architectures and reguralisation strategies so that our measures can be used as a reliable indicator for overfitting while decoupling the network's generalisation capabilities from its loss-based definition. Moreover, our differentiable sparsity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning
