Semi-Supervised Learning of Noisy Mixture of Experts Models
Oh-Ran Kwon, Gourab Mukherjee, Jacob Bien

TL;DR
This paper introduces a semi-supervised learning method for mixture of experts models that handles noisy data connections, improving robustness and convergence in complex data settings.
Contribution
It relaxes the strong assumption linking latent clusters to gating influence, proposing a least trimmed squares algorithm with theoretical convergence guarantees.
Findings
Method performs well with noisy unlabeled data
Achieves near-parametric convergence rates
Effective on simulated and real datasets
Abstract
The mixture of experts (MoE) model is a versatile framework for predictive modeling that has gained renewed interest in the age of large language models. A collection of predictive ``experts'' is learned along with a ``gating function'' that controls how much influence each expert is given when a prediction is made. This structure allows relatively simple models to excel in complex, heterogeneous data settings. In many contemporary settings, unlabeled data are widely available while labeled data are difficult to obtain. Semi-supervised learning methods seek to leverage the unlabeled data. We propose a novel method for semi-supervised learning of MoE models. We start from a semi-supervised MoE model that was developed by oceanographers that makes the strong assumption that the latent clustering structure in unlabeled data maps directly to the influence that the gating function should…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGrey System Theory Applications
