Boundary Mass and the Soft-to-Hard Limit in Mixture-of-Experts
Reza Rastegar

TL;DR
This paper analyzes the transition from soft to hard routing in mixture-of-experts models, revealing that the limit is governed by a thin geometric layer around routing interfaces, with implications for model behavior and recovery.
Contribution
It provides a geometric and probabilistic analysis of the soft-to-hard limit in MoE models, including risk bounds and landscape transfer principles.
Findings
Boundary mass is linear in the slab width, controlled by a surface integral.
Soft routing inherits properties of the hard-routing problem under certain conditions.
The zero-temperature limit is governed by a boundary layer, not the entire input space.
Abstract
Softmax-routed mixture-of-experts models approach hard routing as the temperature tends to zero, but this limit is singular near routing ties. This paper studies that singularity at the population level for squared-loss MoE regression. The central object is the \emph{boundary mass}, namely the probability that the top two router scores are separated by only a small margin. Under smoothness and transversality assumptions on the router and input law, we prove coarea/tube estimates showing that this mass is linear in the slab width, with leading constant given by a surface integral over the routing interface in the binary case. These estimates yield quantitative soft-to-hard risk bounds and, under compactness and uniform margin control, -convergence of the soft objectives to the hard-routing objective. The main conclusion is that the zero-temperature limit is controlled by a thin…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
