Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Mitchell Wortsman, Gabriel Ilharco, Samir Yitzhak Gadre, Rebecca, Roelofs, Raphael Gontijo-Lopes, Ari S. Morcos, Hongseok Namkoong, Ali, Farhadi, Yair Carmon, Simon Kornblith, Ludwig Schmidt

TL;DR
This paper introduces 'model soups', a method of averaging weights of multiple fine-tuned models to improve accuracy and robustness without increasing inference costs, demonstrating state-of-the-art results across various tasks.
Contribution
The paper proposes a novel weight-averaging technique called model soups that enhances model performance and robustness in fine-tuning large pre-trained models.
Findings
Model soups improve accuracy over individual fine-tuned models.
Model soups achieve state-of-the-art results on ImageNet with ViT-G.
The approach extends to multiple tasks and improves out-of-distribution and zero-shot performance.
Abstract
The conventional recipe for maximizing model accuracy is to (1) train multiple models with various hyperparameters and (2) pick the individual model which performs best on a held-out validation set, discarding the remainder. In this paper, we revisit the second step of this procedure in the context of fine-tuning large pre-trained models, where fine-tuned models often appear to lie in a single low error basin. We show that averaging the weights of multiple models fine-tuned with different hyperparameter configurations often improves accuracy and robustness. Unlike a conventional ensemble, we may average many models without incurring any additional inference or memory costs -- we call the results "model soups." When fine-tuning large pre-trained models such as CLIP, ALIGN, and a ViT-G pre-trained on JFT, our soup recipe provides significant improvements over the best model in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗abacusai/TheProfessor-155bmodel· 125 dl· ♡ 101125 dl♡ 101
- 🤗julien-c/Mistral-7B-Neural-Story-mixmodel· 19 dl· ♡ 719 dl♡ 7
- 🤗pcuenq/Mistral-7B-Neural-Story-mixmodel· 4 dl· ♡ 14 dl♡ 1
- 🤗chargoddard/average-dolphin-8x7Bmodel· 100 dl· ♡ 1100 dl♡ 1
- 🤗dustydecapod/Jovian-10.7B-v1.0model· 13 dl· ♡ 113 dl♡ 1
- 🤗gqd/mistral-merge-7bmodel· 96 dl· ♡ 196 dl♡ 1
- 🤗NLPinas/yi-bagel-2x34bmodel· 97 dl· ♡ 297 dl♡ 2
- 🤗LoneStriker/Air-Striker-Mixtral-8x7B-Instruct-ZLoss-3.5bpw-h6-exl2model· 1 dl1 dl
- 🤗LoneStriker/Air-Striker-Mixtral-8x7B-Instruct-ZLoss-3.75bpw-h6-exl2model· ♡ 9♡ 9
- 🤗LoneStriker/Air-Striker-Mixtral-8x7B-Instruct-ZLoss-4.0bpw-h6-exl2model· 6 dl6 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Machine Learning and Data Classification
MethodsModel Soups · ALIGN · Contrastive Language-Image Pre-training
