Fast Model Selection and Stable Optimization for Softmax-Gated Multinomial-Logistic Mixture of Experts Models
TrungKhang Tran, TrungTin Nguyen, Md Abul Bashar, Nhat Ho, Richi Nayak, Christopher Drovandi

TL;DR
This paper introduces a stable, efficient training algorithm for softmax-gated Mixture-of-Experts models, along with a principled method for selecting the number of experts, validated on biological data.
Contribution
It develops a globally convergent MM algorithm for softmax-gated MoE and proposes a new model selection method based on dendrograms, improving stability and interpretability.
Findings
Guaranteed monotone ascent and convergence of the training algorithm.
Near-parametric optimal rates for model selection and parameter recovery.
Improved accuracy and calibration on biological protein interaction data.
Abstract
Mixture-of-Experts (MoE) architectures combine specialized predictors through a learned gate and are effective across regression and classification, but for classification with softmax multinomial-logistic gating, rigorous guarantees for stable maximum-likelihood training and principled model selection remain limited. We address both issues in the full-data (batch) regime. First, we derive a batch minorization-maximization (MM) algorithm for softmax-gated multinomial-logistic MoE using an explicit quadratic minorizer, yielding coordinate-wise closed-form updates that guarantee monotone ascent of the objective and global convergence to a stationary point (in the standard MM sense), avoiding approximate M-steps common in EM-type implementations. Second, we prove finite-sample rates for conditional density estimation and parameter recovery, and we adapt dendrograms of mixing measures to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Machine Learning and Algorithms · Bayesian Methods and Mixture Models
