Slope and generalization properties of neural networks
Anton Johansson, Niklas Engsner, Claes Stranneg{\aa}rd, Petter Mostad

TL;DR
This paper introduces the concept of controlling the slope of neural networks to improve generalization, providing theoretical insights and empirical evidence that slope distribution is architecture-independent and smoothly varying.
Contribution
It proposes the slope as a measure to control neural network complexity, with theoretical properties and empirical validation across different architectures.
Findings
Slope distribution is independent of layer width in trained networks.
Mean slope has weak dependence on architecture.
Slope varies smoothly and aligns with theoretical predictions.
Abstract
Neural networks are very successful tools in for example advanced classification. From a statistical point of view, fitting a neural network may be seen as a kind of regression, where we seek a function from the input space to a space of classification probabilities that follows the "general" shape of the data, but avoids overfitting by avoiding memorization of individual data points. In statistics, this can be done by controlling the geometric complexity of the regression function. We propose to do something similar when fitting neural networks by controlling the slope of the network. After defining the slope and discussing some of its theoretical properties, we go on to show empirically in examples, using ReLU networks, that the distribution of the slope of a well-trained neural network classifier is generally independent of the width of the layers in a fully connected network, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
