When narrower is better: the narrow width limit of Bayesian parallel branching neural networks
Zechen Zhang, Haim Sompolinsky

TL;DR
This paper reveals that in Bayesian parallel branching neural networks, narrower widths can lead to better or comparable performance to wider networks, especially in bias-limited scenarios, due to symmetry breaking and robust branch learning.
Contribution
It introduces a novel narrow width regime for Bayesian branching networks, showing improved performance and data-reflective readout norms in this regime.
Findings
Narrower BPB-NNs exhibit more robust learning due to symmetry breaking.
Performance in narrow width limit can surpass or match wide networks in bias-limited cases.
The phenomenon extends to various architectures like residual-MLP and graph neural networks.
Abstract
The infinite width limit of random neural networks is known to result in Neural Networks as Gaussian Process (NNGP) (Lee et al. (2018)), characterized by task-independent kernels. It is widely accepted that larger network widths contribute to improved generalization (Park et al. (2019)). However, this work challenges this notion by investigating the narrow width limit of the Bayesian Parallel Branching Neural Network (BPB-NN), an architecture that resembles neural networks with residual blocks. We demonstrate that when the width of a BPB-NN is significantly smaller compared to the number of training examples, each branch exhibits more robust learning due to a symmetry breaking of branches in kernel renormalization. Surprisingly, the performance of a BPB-NN in the narrow width limit is generally superior to or comparable to that achieved in the wide width limit in bias-limited scenarios.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and ELM · Face and Expression Recognition
MethodsGaussian Process · Graph Neural Network
