Generalization of an Upper Bound on the Number of Nodes Needed to Achieve Linear Separability
Marjolein Troost, Katja Seeliger, Marcel van Gerven

TL;DR
This paper derives an upper bound on the number of nodes needed in a two-hidden-layer neural network to achieve linear separability, based on data structure and activation functions, with empirical validation.
Contribution
It provides a new theoretical upper bound on network size for linear separability considering data structure and activation functions, extending previous results.
Findings
Upper bound depends on data structure and activation function properties.
For leaky ReLU, similar bounds hold under certain slope conditions.
Empirical results support the theoretical bounds.
Abstract
An important issue in neural network research is how to choose the number of nodes and layers such as to solve a classification problem. We provide new intuitions based on earlier results by An et al. (2015) by deriving an upper bound on the number of nodes in networks with two hidden layers such that linear separability can be achieved. Concretely, we show that if the data can be described in terms of N finite sets and the used activation function f is non-constant, increasing and has a left asymptote, we can derive how many nodes are needed to linearly separate these sets. This will be an upper bound that depends on the structure of the data. This structure can be analyzed using an algorithm. For the leaky rectified linear activation function, we prove separately that under some conditions on the slope, the same number of layers and nodes as for the aforementioned activation functions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Stochastic Gradient Optimization Techniques · Advanced Graph Neural Networks
