Embedded Ensembles: Infinite Width Limit and Operating Regimes
Maksim Velikanov, Roman Kail, Ivan Anokhin, Roman Vashurin, Maxim, Panov, Alexey Zaytsev, Dmitry Yarotsky

TL;DR
This paper analyzes embedded ensembles in neural networks, revealing two operating regimes and providing a theoretical framework for understanding their behavior and performance scaling.
Contribution
It introduces a Neural-Tangent-Kernel-based theory for embedded ensembles, identifying independent and collective regimes and analyzing their properties.
Findings
Embedded ensembles can operate in independent or collective regimes.
Theoretical predictions match empirical results across various network sizes.
Performance scaling depends on network width and ensemble size.
Abstract
A memory efficient approach to ensembling neural networks is to share most weights among the ensembled models by means of a single reference network. We refer to this strategy as Embedded Ensembling (EE); its particular examples are BatchEnsembles and Monte-Carlo dropout ensembles. In this paper we perform a systematic theoretical and empirical analysis of embedded ensembles with different number of models. Theoretically, we use a Neural-Tangent-Kernel-based approach to derive the wide network limit of the gradient descent dynamics. In this limit, we identify two ensemble regimes - independent and collective - depending on the architecture and initialization strategy of ensemble models. We prove that in the independent regime the embedded ensemble behaves as an ensemble of independent models. We confirm our theoretical prediction with a wide range of experiments with finite networks,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Stochastic Gradient Optimization Techniques · Advanced Neural Network Applications
MethodsDropout
