WASH: Train your Ensemble with Communication-Efficient Weight Shuffling,   then Average

Louis Fournier (MLIA); Adel Nabli (MLIA; Mila); Masih Aminbeidokhti; (ETS); Marco Pedersoli (ETS); Eugene Belilovsky (Mila); Edouard Oyallon

arXiv:2405.17517·cs.LG·May 29, 2024

WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average

Louis Fournier (MLIA), Adel Nabli (MLIA, Mila), Masih Aminbeidokhti, (ETS), Marco Pedersoli (ETS), Eugene Belilovsky (Mila), Edouard Oyallon

PDF

Open Access

TL;DR

WASH is a novel distributed training method that improves ensemble performance and efficiency by shuffling weights during training to keep models in the same loss basin, reducing communication costs.

Contribution

WASH introduces a new weight shuffling technique during training to enhance ensemble accuracy and efficiency with lower communication overhead.

Findings

01

Achieves state-of-the-art image classification accuracy.

02

Maintains models in the same loss basin through weight shuffling.

03

Reduces communication costs compared to existing methods.

Abstract

The performance of deep neural networks is enhanced by ensemble methods, which average the output of several models. However, this comes at an increased cost at inference. Weight averaging methods aim at balancing the generalization of ensembling and the inference speed of a single model by averaging the parameters of an ensemble of models. Yet, naive averaging results in poor performance as models converge to different loss basins, and aligning the models to improve the performance of the average is challenging. Alternatively, inspired by distributed training, methods like DART and PAPA have been proposed to train several models in parallel such that they will end up in the same basin, resulting in good averaging accuracy. However, these methods either compromise ensembling accuracy or demand significant communication between models during training. In this paper, we introduce WASH, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIoT and Edge/Fog Computing · Energy Efficient Wireless Sensor Networks

MethodsDifficulty-Aware Rejection Tuning · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings