Scalable Transfer Learning with Expert Models
Joan Puigcerver, Carlos Riquelme, Basil Mustafa, Cedric Renggli,, Andr\'e Susano Pinto, Sylvain Gelly, Daniel Keysers, Neil Houlsby

TL;DR
This paper presents a scalable transfer learning method using expert models, which efficiently selects relevant experts for new tasks, significantly reducing computation and outperforming baselines on diverse vision tasks.
Contribution
The paper introduces a novel expert-based transfer learning strategy that scales efficiently without revisiting pre-training data and compresses multiple experts into a single adaptable model.
Findings
Achieves 2-3 orders of magnitude speed-up over existing methods.
Outperforms baseline models on over 20 diverse vision tasks.
Demonstrates effective expert selection using performance proxies.
Abstract
Transfer of pre-trained representations can improve sample efficiency and reduce computational requirements for new tasks. However, representations used for transfer are usually generic, and are not tailored to a particular distribution of downstream tasks. We explore the use of expert representations for transfer with a simple, yet effective, strategy. We train a diverse set of experts by exploiting existing label structures, and use cheap-to-compute performance proxies to select the relevant expert for each target task. This strategy scales the process of transferring to new tasks, since it does not revisit the pre-training data during transfer. Accordingly, it requires little extra compute per target task, and results in a speed-up of 2-3 orders of magnitude compared to competing approaches. Further, we provide an adapter-based architecture able to compress many experts into a single…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications
