Rethinking Multinomial Logistic Mixture of Experts with Sigmoid Gating Function
Tuan Minh Pham, Thinh Cao, Viet Nguyen, Huy Nguyen, Nhat Ho, Alessandro Rinaldo

TL;DR
This paper analyzes the benefits and limitations of sigmoid gating in mixture-of-experts models, proposing modifications to improve convergence and sample complexity, especially in classification tasks.
Contribution
It provides a comprehensive theoretical analysis of sigmoid gates in MoE, introduces a Euclidean scoring method, and addresses convergence and sample complexity issues.
Findings
Sigmoid gate has lower sample complexity than softmax for parameter estimation.
Incorporating a temperature parameter can cause exponential sample complexity.
Replacing the inner product score with a Euclidean score improves sample complexity to polynomial order.
Abstract
The sigmoid gate in mixture-of-experts (MoE) models has been empirically shown to outperform the softmax gate across several tasks, ranging from approximating feed-forward networks to language modeling. Additionally, recent efforts have demonstrated that the sigmoid gate is provably more sample-efficient than its softmax counterpart under regression settings. Nevertheless, there are three notable concerns that have not been addressed in the literature, namely (i) the benefits of the sigmoid gate have not been established under classification settings; (ii) existing sigmoid-gated MoE models may not converge to their ground-truth; and (iii) the effects of a temperature parameter in the sigmoid gate remain theoretically underexplored. To tackle these open problems, we perform a comprehensive analysis of multinomial logistic MoE equipped with a modified sigmoid gate to ensure model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Mobile Crowdsensing and Crowdsourcing · Machine Learning and Algorithms
