Horseshoe Mixtures-of-Experts (HS-MoE)

Nick Polson; Vadim Sokolov

arXiv:2601.09043·stat.ML·January 15, 2026

Horseshoe Mixtures-of-Experts (HS-MoE)

Nick Polson, Vadim Sokolov

PDF

Open Access

TL;DR

This paper introduces Horseshoe Mixtures-of-Experts (HS-MoE), a Bayesian model that achieves data-adaptive sparsity in expert selection using a novel particle learning inference algorithm, relevant for large language models.

Contribution

It presents a new Bayesian framework with a particle learning algorithm for sequential inference in sparse mixture-of-experts models, connecting to large language model architectures.

Findings

01

Effective sparse expert selection via horseshoe prior

02

Particle learning algorithm for sequential inference

03

Relevance to large language models with extreme sparsity

Abstract

Horseshoe mixtures-of-experts (HS-MoE) models provide a Bayesian framework for sparse expert selection in mixture-of-experts architectures. We combine the horseshoe prior's adaptive global-local shrinkage with input-dependent gating, yielding data-adaptive sparsity in expert usage. Our primary methodological contribution is a particle learning algorithm for sequential inference, in which the filter is propagated forward in time while tracking only sufficient statistics. We also discuss how HS-MoE relates to modern mixture-of-experts layers in large language models, which are deployed under extreme sparsity constraints (e.g., activating a small number of experts per token out of a large pool).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Machine Learning and Algorithms · Advanced Bandit Algorithms Research