Fixed width treelike neural networks capacity analysis -- generic activations
Mihailo Stojnic

TL;DR
This paper extends capacity analysis of treelike neural networks to more general activation functions, revealing that maximum capacity occurs with just two hidden neurons and that capacity converges as layer width increases.
Contribution
It generalizes existing capacity analysis frameworks to quadratic and ReLU activations, showing capacity bounds decrease with width and peak at two hidden neurons.
Findings
Capacity bounds decrease with increasing hidden layer width.
Maximum capacity is achieved with exactly two hidden neurons.
Results align with statistical physics predictions.
Abstract
We consider the capacity of \emph{treelike committee machines} (TCM) neural networks. Relying on Random Duality Theory (RDT), \cite{Stojnictcmspnncaprdt23} recently introduced a generic framework for their capacity analysis. An upgrade based on the so-called \emph{partially lifted} RDT (pl RDT) was then presented in \cite{Stojnictcmspnncapliftedrdt23}. Both lines of work focused on the networks with the most typical, \emph{sign}, activations. Here, on the other hand, we focus on networks with other, more general, types of activations and show that the frameworks of \cite{Stojnictcmspnncaprdt23,Stojnictcmspnncapliftedrdt23} are sufficiently powerful to enable handling of such scenarios as well. In addition to the standard \emph{linear} activations, we uncover that particularly convenient results can be obtained for two very commonly used activations, namely, the \emph{quadratic} and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsFocus
