When Ensembling Smaller Models is More Efficient than Single Large   Models

Dan Kondratyuk; Mingxing Tan; Matthew Brown; and Boqing Gong

arXiv:2005.00570·cs.LG·May 5, 2020·24 cites

When Ensembling Smaller Models is More Efficient than Single Large Models

Dan Kondratyuk, Mingxing Tan, Matthew Brown, and Boqing Gong

PDF

Open Access

TL;DR

Ensembling smaller, diverse models can outperform single large models in accuracy and efficiency, especially as models grow larger, offering a flexible alternative to increasing model size for better performance.

Contribution

This paper demonstrates that ensembling smaller models can be more effective and efficient than training larger models, challenging the common practice in model scaling.

Findings

01

Ensembles outperform single models in accuracy on CIFAR-10 and ImageNet.

02

Ensembles require fewer FLOPs than larger models for similar or better performance.

03

The advantage of ensembling increases with model size.

Abstract

Ensembling is a simple and popular technique for boosting evaluation performance by training multiple models (e.g., with different initializations) and aggregating their predictions. This approach is commonly reserved for the largest models, as it is commonly held that increasing the model size provides a more substantial reduction in error than ensembling smaller models. However, we show results from experiments on CIFAR-10 and ImageNet that ensembles can outperform single models with both higher accuracy and requiring fewer total FLOPs to compute, even when those individual models' weights and hyperparameters are highly optimized. Furthermore, this gap in improvement widens as models become large. This presents an interesting observation that output diversity in ensembling can often be more efficient than training larger models, especially when the models approach the size of what…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings