A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts
Huy Nguyen, Pedram Akbarian, TrungTin Nguyen, Nhat Ho

TL;DR
This paper establishes convergence rates for softmax gating multinomial logistic MoE models in classification, identifies issues with slow rates due to gating-expert interactions, and proposes modified gating functions to improve estimation efficiency.
Contribution
The paper provides the first theoretical convergence analysis for classification MoE models and introduces modified softmax gating functions to enhance estimation rates.
Findings
Convergence rates for density and parameter estimation are established.
Slow rates occur when expert parameters vanish due to gating-expert interactions.
Modified gating functions significantly improve estimation rates.
Abstract
Mixture-of-experts (MoE) model incorporates the power of multiple submodels via gating functions to achieve greater performance in numerous regression and classification applications. From a theoretical perspective, while there have been previous attempts to comprehend the behavior of that model under the regression settings through the convergence analysis of maximum likelihood estimation in the Gaussian MoE model, such analysis under the setting of a classification problem has remained missing in the literature. We close this gap by establishing the convergence rates of density estimation and parameter estimation in the softmax gating multinomial logistic MoE model. Notably, when part of the expert parameters vanish, these rates are shown to be slower than polynomial rates owing to an inherent interaction between the softmax gating and expert functions via partial differential…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSurvey Sampling and Estimation Techniques · Bayesian Methods and Mixture Models · COVID-19 epidemiological studies
MethodsSoftmax
