Learning Scalable Model Soup on a Single GPU: An Efficient Subspace Training Strategy
Tao Li, Weisen Jiang, Fanghui Liu, Xiaolin Huang, James T. Kwok

TL;DR
This paper introduces MEHL-Soup, a memory-efficient and faster method for model soups that enables scalable hyperparameter tuning on a single GPU by formulating the problem as hyperplane optimization.
Contribution
The paper proposes MEHL-Soup, a novel hyperplane optimization approach for model soups that significantly reduces memory and computation costs, enabling single-GPU training.
Findings
MEHL-Soup outperforms Learned-Soup in test accuracy.
Memory usage is reduced by more than 13 times.
Soup construction speed is increased by 9 times.
Abstract
Pre-training followed by fine-tuning is widely adopted among practitioners. The performance can be improved by "model soups"~\cite{wortsman2022model} via exploring various hyperparameter configurations.The Learned-Soup, a variant of model soups, significantly improves the performance but suffers from substantial memory and time costs due to the requirements of (i) having to load all fine-tuned models simultaneously, and (ii) a large computational graph encompassing all fine-tuned models. In this paper, we propose Memory Efficient Hyperplane Learned Soup (MEHL-Soup) to tackle this issue by formulating the learned soup as a hyperplane optimization problem and introducing block coordinate gradient descent to learn the mixing coefficients. At each iteration, MEHL-Soup only needs to load a few fine-tuned models and build a computational graph with one combined model. We further extend…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
