The Impact of Activation Sparsity on Overfitting in Convolutional Neural   Networks

Karim Huesmann; Luis Garcia Rodriguez; Lars Linsen; and Benjamin Risse

arXiv:2104.06153·cs.LG·April 14, 2021

The Impact of Activation Sparsity on Overfitting in Convolutional Neural Networks

Karim Huesmann, Luis Garcia Rodriguez, Lars Linsen, and Benjamin Risse

PDF

Open Access

TL;DR

This paper investigates how activation sparsity relates to overfitting in CNNs, introducing new explainable AI measures and a differentiable sparsity penalty to improve understanding and control of overfitting.

Contribution

It introduces a perplexity-based sparsity measure, visualizes layer-wise activation patterns, and proposes a differentiable penalty to study and reduce overfitting in CNNs.

Findings

01

Activation sparsity increases before overfitting occurs.

02

Reduced sparsity improves generalization and classification performance.

03

Dense activations enable better feature learning without overfitting.

Abstract

Overfitting is one of the fundamental challenges when training convolutional neural networks and is usually identified by a diverging training and test loss. The underlying dynamics of how the flow of activations induce overfitting is however poorly understood. In this study we introduce a perplexity-based sparsity definition to derive and visualise layer-wise activation measures. These novel explainable AI strategies reveal a surprising relationship between activation sparsity and overfitting, namely an increase in sparsity in the feature extraction layers shortly before the test loss starts rising. This tendency is preserved across network architectures and reguralisation strategies so that our measures can be used as a reliable indicator for overfitting while decoupling the network's generalisation capabilities from its loss-based definition. Moreover, our differentiable sparsity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning