GLUE: Gradient-free Learning to Unify Experts
Jong-Ik Park, Shreyas Chaudhari, Srinivasa Pranav, Carlee Joe-Wong, Jos\'e M. F. Moura

TL;DR
GLUE introduces a gradient-free method for unifying multiple expert models into a single, well-initialized model that can be fine-tuned to outperform traditional blending techniques across various datasets and architectures.
Contribution
GLUE proposes a novel gradient-free approach using SPSA to learn mixture coefficients for expert models, improving initialization and subsequent fine-tuning performance.
Findings
GLUE outperforms data-size weighting by up to 8.5% in accuracy.
GLUE surpasses proxy-metric selection by up to 9.1%.
GLUE matches or exceeds full-gradient mixing performance within 1.4%.
Abstract
In many deployed systems (multilingual ASR, cross-hospital imaging, region-specific perception), multiple pretrained specialist models coexist. Yet, new target domains often require domain expansion: a generalized model that performs well beyond any single specialist's domain. Given a new target domain, existing methods obtain a single strong initialization prior for the model parameters by blending expert models to initialize a target model. However, heuristic blending -- using mixing coefficients based on data size or proxy metrics -- often yields lower target-domain test accuracy, and learning these coefficients on the target domain's loss function typically requires computationally-expensive full backpropagation through a neural network. We propose GLUE, Gradient-free Learning to Unify Experts, which initializes the target model as a convex combination of fixed experts and learns…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications
