When Ensembling Smaller Models is More Efficient than Single Large Models
Dan Kondratyuk, Mingxing Tan, Matthew Brown, and Boqing Gong

TL;DR
Ensembling smaller, diverse models can outperform single large models in accuracy and efficiency, especially as models grow larger, offering a flexible alternative to increasing model size for better performance.
Contribution
This paper demonstrates that ensembling smaller models can be more effective and efficient than training larger models, challenging the common practice in model scaling.
Findings
Ensembles outperform single models in accuracy on CIFAR-10 and ImageNet.
Ensembles require fewer FLOPs than larger models for similar or better performance.
The advantage of ensembling increases with model size.
Abstract
Ensembling is a simple and popular technique for boosting evaluation performance by training multiple models (e.g., with different initializations) and aggregating their predictions. This approach is commonly reserved for the largest models, as it is commonly held that increasing the model size provides a more substantial reduction in error than ensembling smaller models. However, we show results from experiments on CIFAR-10 and ImageNet that ensembles can outperform single models with both higher accuracy and requiring fewer total FLOPs to compute, even when those individual models' weights and hyperparameters are highly optimized. Furthermore, this gap in improvement widens as models become large. This presents an interesting observation that output diversity in ensembling can often be more efficient than training larger models, especially when the models approach the size of what…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
