Effective training-time stacking for ensembling of deep neural networks

Polina Proscura; Alexey Zaytsev

arXiv:2206.13491·cs.LG·June 28, 2022

Effective training-time stacking for ensembling of deep neural networks

Polina Proscura, Alexey Zaytsev

PDF

Open Access

TL;DR

This paper introduces a training-time stacking method for deep neural network ensembling that selects and weights models during training, improving ensemble quality without additional validation data or increased training time.

Contribution

It proposes a novel training-time weighting approach for snapshot ensembling that enhances model ensemble performance efficiently.

Findings

01

Weighted ensembles outperform vanilla snapshot ensembles.

02

The method achieves higher accuracy on Fashion MNIST, CIFAR-10, and CIFAR-100.

03

Training time remains similar to single model training.

Abstract

Ensembling is a popular and effective method for improving machine learning (ML) models. It proves its value not only in classical ML but also for deep learning. Ensembles enhance the quality and trustworthiness of ML solutions, and allow uncertainty estimation. However, they come at a price: training ensembles of deep learning models eat a huge amount of computational resources. A snapshot ensembling collects models in the ensemble along a single training path. As it runs training only one time, the computational time is similar to the training of one model. However, the quality of models along the training path is different: typically, later models are better if no overfitting occurs. So, the models are of varying utility. Our method improves snapshot ensembling by selecting and weighting ensemble members along the training path. It relies on training-time likelihoods without…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Neural Networks and Applications · Anomaly Detection Techniques and Applications