Model soups to increase inference without increasing compute time
Charles Dansereau, Milo Sobral, Maninder Bhogal, Mehdi Zalai

TL;DR
This paper evaluates different model soup strategies to improve inference performance across various vision models, introducing a new pruned soup recipe that outperforms previous methods in certain cases.
Contribution
The paper introduces a new pruned soup recipe and compares multiple soup strategies across different models, highlighting limitations of weight-averaging in model soups.
Findings
Model soups improved performance for Vision Transformer models.
Pruned soup outperformed uniform and greedy soups in experiments.
Limitations of weight-averaging were identified during analysis.
Abstract
In this paper, we compare Model Soups performances on three different models (ResNet, ViT and EfficientNet) using three Soup Recipes (Greedy Soup Sorted, Greedy Soup Random and Uniform soup) from arXiv:2203.05482, and reproduce the results of the authors. We then introduce a new Soup Recipe called Pruned Soup. Results from the soups were better than the best individual model for the pre-trained vision transformer, but were much worst for the ResNet and the EfficientNet. Our pruned soup performed better than the uniform and greedy soups presented in the original paper. We also discuss the limitations of weight-averaging that were found during the experiments. The code for our model soup library and the experiments with different models can be found here: https://github.com/milo-sobral/ModelSoup
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
MethodsLib · Model Soups · *Communicated@Fast*How Do I Communicate to Expedia? · Depthwise Convolution · Pointwise Convolution · Dense Connections · Kaiming Initialization · Depthwise Separable Convolution · Inverted Residual Block · Residual Connection
