Deep Ensembles on a Fixed Memory Budget: One Wide Network or Several   Thinner Ones?

Nadezhda Chirkova; Ekaterina Lobacheva; Dmitry Vetrov

arXiv:2005.07292·cs.LG·May 18, 2020·6 cites

Deep Ensembles on a Fixed Memory Budget: One Wide Network or Several Thinner Ones?

Nadezhda Chirkova, Ekaterina Lobacheva, Dmitry Vetrov

PDF

Open Access

TL;DR

This paper investigates whether, under a fixed memory budget, training multiple thinner networks as an ensemble outperforms a single wide network, demonstrating the effectiveness of memory splitting across various datasets and architectures.

Contribution

It introduces the concept of the Memory Split Advantage, showing that ensemble of thinner networks often surpasses a single wide network in accuracy under fixed memory constraints.

Findings

01

Ensembles of thinner networks outperform wide networks at the same total parameter count.

02

The optimal number of networks in an ensemble increases with larger memory budgets.

03

The Memory Split Advantage is consistent across different datasets and architectures.

Abstract

One of the generally accepted views of modern deep learning is that increasing the number of parameters usually leads to better quality. The two easiest ways to increase the number of parameters is to increase the size of the network, e.g. width, or to train a deep ensemble; both approaches improve the performance in practice. In this work, we consider a fixed memory budget setting, and investigate, what is more effective: to train a single wide network, or to perform a memory split -- to train an ensemble of several thinner networks, with the same total number of parameters? We find that, for large enough budgets, the number of networks in the ensemble, corresponding to the optimal memory split, is usually larger than one. Interestingly, this effect holds for the commonly used sizes of the standard architectures. For example, one WideResNet-28-10 achieves significantly worse test…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Machine Learning and Data Classification