Effective training-time stacking for ensembling of deep neural networks
Polina Proscura, Alexey Zaytsev

TL;DR
This paper introduces a training-time stacking method for deep neural network ensembling that selects and weights models during training, improving ensemble quality without additional validation data or increased training time.
Contribution
It proposes a novel training-time weighting approach for snapshot ensembling that enhances model ensemble performance efficiently.
Findings
Weighted ensembles outperform vanilla snapshot ensembles.
The method achieves higher accuracy on Fashion MNIST, CIFAR-10, and CIFAR-100.
Training time remains similar to single model training.
Abstract
Ensembling is a popular and effective method for improving machine learning (ML) models. It proves its value not only in classical ML but also for deep learning. Ensembles enhance the quality and trustworthiness of ML solutions, and allow uncertainty estimation. However, they come at a price: training ensembles of deep learning models eat a huge amount of computational resources. A snapshot ensembling collects models in the ensemble along a single training path. As it runs training only one time, the computational time is similar to the training of one model. However, the quality of models along the training path is different: typically, later models are better if no overfitting occurs. So, the models are of varying utility. Our method improves snapshot ensembling by selecting and weighting ensemble members along the training path. It relies on training-time likelihoods without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Neural Networks and Applications · Anomaly Detection Techniques and Applications
