Model soups: averaging weights of multiple fine-tuned models improves   accuracy without increasing inference time

Mitchell Wortsman; Gabriel Ilharco; Samir Yitzhak Gadre; Rebecca; Roelofs; Raphael Gontijo-Lopes; Ari S. Morcos; Hongseok Namkoong; Ali; Farhadi; Yair Carmon; Simon Kornblith; Ludwig Schmidt

arXiv:2203.05482·cs.LG·July 5, 2022·205 cites

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

Mitchell Wortsman, Gabriel Ilharco, Samir Yitzhak Gadre, Rebecca, Roelofs, Raphael Gontijo-Lopes, Ari S. Morcos, Hongseok Namkoong, Ali, Farhadi, Yair Carmon, Simon Kornblith, Ludwig Schmidt

PDF

Open Access 5 Repos 10 Models

TL;DR

This paper introduces 'model soups', a method of averaging weights of multiple fine-tuned models to improve accuracy and robustness without increasing inference costs, demonstrating state-of-the-art results across various tasks.

Contribution

The paper proposes a novel weight-averaging technique called model soups that enhances model performance and robustness in fine-tuning large pre-trained models.

Findings

01

Model soups improve accuracy over individual fine-tuned models.

02

Model soups achieve state-of-the-art results on ImageNet with ViT-G.

03

The approach extends to multiple tasks and improves out-of-distribution and zero-shot performance.

Abstract

The conventional recipe for maximizing model accuracy is to (1) train multiple models with various hyperparameters and (2) pick the individual model which performs best on a held-out validation set, discarding the remainder. In this paper, we revisit the second step of this procedure in the context of fine-tuning large pre-trained models, where fine-tuned models often appear to lie in a single low error basin. We show that averaging the weights of multiple models fine-tuned with different hyperparameter configurations often improves accuracy and robustness. Unlike a conventional ensemble, we may average many models without incurring any additional inference or memory costs -- we call the results "model soups." When fine-tuning large pre-trained models such as CLIP, ALIGN, and a ViT-G pre-trained on JFT, our soup recipe provides significant improvements over the best model in a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Machine Learning and Data Classification

MethodsModel Soups · ALIGN · Contrastive Language-Image Pre-training