GLUE: Gradient-free Learning to Unify Experts

Jong-Ik Park; Shreyas Chaudhari; Srinivasa Pranav; Carlee Joe-Wong; Jos\'e M. F. Moura

arXiv:2512.22467·cs.LG·January 30, 2026

GLUE: Gradient-free Learning to Unify Experts

Jong-Ik Park, Shreyas Chaudhari, Srinivasa Pranav, Carlee Joe-Wong, Jos\'e M. F. Moura

PDF

Open Access

TL;DR

GLUE introduces a gradient-free method for unifying multiple expert models into a single, well-initialized model that can be fine-tuned to outperform traditional blending techniques across various datasets and architectures.

Contribution

GLUE proposes a novel gradient-free approach using SPSA to learn mixture coefficients for expert models, improving initialization and subsequent fine-tuning performance.

Findings

01

GLUE outperforms data-size weighting by up to 8.5% in accuracy.

02

GLUE surpasses proxy-metric selection by up to 9.1%.

03

GLUE matches or exceeds full-gradient mixing performance within 1.4%.

Abstract

In many deployed systems (multilingual ASR, cross-hospital imaging, region-specific perception), multiple pretrained specialist models coexist. Yet, new target domains often require domain expansion: a generalized model that performs well beyond any single specialist's domain. Given a new target domain, existing methods obtain a single strong initialization prior for the model parameters by blending expert models to initialize a target model. However, heuristic blending -- using mixing coefficients based on data size or proxy metrics -- often yields lower target-domain test accuracy, and learning these coefficients on the target domain's loss function typically requires computationally-expensive full backpropagation through a neural network. We propose GLUE, Gradient-free Learning to Unify Experts, which initializes the target model as a convex combination of fixed experts and learns…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications