Mean-field limit from general mixtures of experts to quantum neural networks
Anderson Melchor Hernandez, Davide Pastorello, Giacomo De Palma

TL;DR
This paper analyzes the asymptotic behavior of Mixture of Experts models trained with gradient flow, proving propagation of chaos and convergence to a nonlinear PDE, with applications to quantum neural networks.
Contribution
It establishes the propagation of chaos for MoEs as the number of experts grows and connects their parameter distribution to a nonlinear PDE, including quantum neural network applications.
Findings
Empirical measure of experts' parameters converges to a solution of a nonlinear PDE.
Explicit convergence rate depending on the number of experts.
Application to quantum neural network-generated MoEs.
Abstract
In this work, we study the asymptotic behavior of Mixture of Experts (MoE) trained via gradient flow on supervised learning problems. Our main result establishes the propagation of chaos for a MoE as the number of experts diverges. We demonstrate that the corresponding empirical measure of their parameters is close to a probability measure that solves a nonlinear continuity equation, and we provide an explicit convergence rate that depends solely on the number of experts. We apply our results to a MoE generated by a quantum neural network.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
