Learning Scalable Model Soup on a Single GPU: An Efficient Subspace   Training Strategy

Tao Li; Weisen Jiang; Fanghui Liu; Xiaolin Huang; James T. Kwok

arXiv:2407.03641·cs.LG·July 24, 2024

Learning Scalable Model Soup on a Single GPU: An Efficient Subspace Training Strategy

Tao Li, Weisen Jiang, Fanghui Liu, Xiaolin Huang, James T. Kwok

PDF

Open Access 1 Repo

TL;DR

This paper introduces MEHL-Soup, a memory-efficient and faster method for model soups that enables scalable hyperparameter tuning on a single GPU by formulating the problem as hyperplane optimization.

Contribution

The paper proposes MEHL-Soup, a novel hyperplane optimization approach for model soups that significantly reduces memory and computation costs, enabling single-GPU training.

Findings

01

MEHL-Soup outperforms Learned-Soup in test accuracy.

02

Memory usage is reduced by more than 13 times.

03

Soup construction speed is increased by 9 times.

Abstract

Pre-training followed by fine-tuning is widely adopted among practitioners. The performance can be improved by "model soups"~\cite{wortsman2022model} via exploring various hyperparameter configurations.The Learned-Soup, a variant of model soups, significantly improves the performance but suffers from substantial memory and time costs due to the requirements of (i) having to load all fine-tuned models simultaneously, and (ii) a large computational graph encompassing all fine-tuned models. In this paper, we propose Memory Efficient Hyperplane Learned Soup (MEHL-Soup) to tackle this issue by formulating the learned soup as a hyperplane optimization problem and introducing block coordinate gradient descent to learn the mixing coefficients. At each iteration, MEHL-Soup only needs to load a few fine-tuned models and build a computational graph with one combined model. We further extend…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nblt/mehl-soup
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings