PopulAtion Parameter Averaging (PAPA)
Alexia Jolicoeur-Martineau, Emy Gervais, Kilian Fatras, Yan Zhang,, Simon Lacoste-Julien

TL;DR
PAPA is a novel method that combines the diversity of ensemble models with the efficiency of weight averaging, improving accuracy by leveraging a population of diverse neural networks.
Contribution
We introduce PAPA, a new approach that gradually averages weights of diverse models to bridge the gap between ensembling and weight averaging.
Findings
Increases accuracy by up to 1.9% on CIFAR-100.
Reduces performance gap between averaging and ensembling.
Effective across multiple datasets including CIFAR-10, CIFAR-100, and ImageNet.
Abstract
Ensemble methods combine the predictions of multiple models to improve performance, but they require significantly higher computation costs at inference time. To avoid these costs, multiple neural networks can be combined into one by averaging their weights. However, this usually performs significantly worse than ensembling. Weight averaging is only beneficial when different enough to benefit from combining them, but similar enough to average well. Based on this idea, we propose PopulAtion Parameter Averaging (PAPA): a method that combines the generality of ensembling with the efficiency of weight averaging. PAPA leverages a population of diverse models (trained on different data orders, augmentations, and regularizations) while slowly pushing the weights of the networks toward the population average of the weights. We also propose PAPA variants (PAPA-all, and PAPA-2) that average…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
