Towards Convergence Rates for Parameter Estimation in Gaussian-gated Mixture of Experts
Huy Nguyen, TrungTin Nguyen, Khai Nguyen, Nhat Ho

TL;DR
This paper analyzes the convergence rates of maximum likelihood estimation in Gaussian-gated Mixture of Experts models, revealing different behaviors based on parameter settings and introducing novel loss functions for better understanding.
Contribution
It provides the first convergence analysis for MLE in Gaussian-gated MoE models, addressing the interaction of covariates and expert networks with new Voronoi loss functions.
Findings
MLE exhibits different convergence behaviors depending on location parameters.
The solvability of polynomial systems characterizes these behaviors.
Simulation results support the theoretical analysis.
Abstract
Originally introduced as a neural network for ensemble learning, mixture of experts (MoE) has recently become a fundamental building block of highly successful modern deep neural networks for heterogeneous data analysis in several applications of machine learning and statistics. Despite its popularity in practice, a satisfactory level of theoretical understanding of the MoE model is far from complete. To shed new light on this problem, we provide a convergence analysis for maximum likelihood estimation (MLE) in the Gaussian-gated MoE model. The main challenge of that analysis comes from the inclusion of covariates in the Gaussian gating functions and expert networks, which leads to their intrinsic interaction via some partial differential equations with respect to their parameters. We tackle these issues by designing novel Voronoi loss functions among parameters to accurately capture…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Target Tracking and Data Fusion in Sensor Networks · Gaussian Processes and Bayesian Inference
