On Bayesian Softmax-Gated Mixture-of-Experts Models
Nicola Bariletto, Huy Nguyen, Nhat Ho, Alessandro Rinaldo

TL;DR
This paper provides a comprehensive theoretical analysis of Bayesian mixture-of-experts models with softmax gating, focusing on their asymptotic behavior in density estimation, parameter estimation, and model selection.
Contribution
It establishes posterior contraction rates and convergence guarantees, and proposes strategies for selecting the number of experts, advancing understanding of Bayesian mixture-of-experts models.
Findings
Established posterior contraction rates for density estimation.
Derived convergence guarantees for parameter estimation.
Proposed strategies for selecting the number of experts.
Abstract
Mixture-of-experts models provide a flexible framework for learning complex probabilistic input-output relationships by combining multiple expert models through an input-dependent gating mechanism. These models have become increasingly prominent in modern machine learning, yet their theoretical properties in the Bayesian framework remain largely unexplored. In this paper, we study Bayesian mixture-of-experts models, focusing on the ubiquitous softmax-based gating mechanism. Specifically, we investigate the asymptotic behavior of the posterior distribution for three fundamental statistical tasks: density estimation, parameter estimation, and model selection. First, we establish posterior contraction rates for density estimation, both in the regimes with a fixed, known number of experts and with a random learnable number of experts. We then analyze parameter estimation and derive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
