Convergence Rates for Mixture-of-Experts
Eduardo F. Mendes, Wenxin Jiang

TL;DR
This paper analyzes the convergence rates of mixture-of-experts models with polynomial regression experts, providing theoretical insights into optimal choices of the number of experts and their complexity for better model performance.
Contribution
It offers a theoretical study on the convergence rates of ME models, revealing how the number of experts and their polynomial degree affect learning efficiency.
Findings
Convergence rate depends on both number of experts and expert complexity.
Certain combinations of experts and polynomial degree optimize convergence.
Results inform optimal expert selection and model complexity balancing.
Abstract
In mixtures-of-experts (ME) model, where a number of submodels (experts) are combined, there have been two longstanding problems: (i) how many experts should be chosen, given the size of the training data? (ii) given the total number of parameters, is it better to use a few very complex experts, or is it better to combine many simple experts? In this paper, we try to provide some insights to these problems through a theoretic study on a ME structure where experts are mixed, with each expert being related to a polynomial regression model of order . We study the convergence rate of the maximum likelihood estimator (MLE), in terms of how fast the Kullback-Leibler divergence of the estimated density converges to the true density, when the sample size increases. The convergence rate is found to be dependent on both and , and certain choices of and are found to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Statistical Methods and Inference · Statistical Methods and Bayesian Inference
