Geometry-Preserving Aggregation for Mixture-of-Experts Embedding Models
Sajjad Kachuee, Mohammad Sharifkhani

TL;DR
This paper identifies geometric limitations in traditional MoE embedding aggregation and proposes SBA, a geometry-preserving method that maintains hyperspherical structure, leading to improved performance and stability in text embedding tasks.
Contribution
The paper introduces Spherical Barycentric Aggregation (SBA), a novel geometry-aware aggregation method that preserves hyperspherical structure in MoE embeddings, addressing a key inconsistency in existing models.
Findings
SBA prevents aggregation-induced collapse.
SBA improves performance on MTEB tasks.
SBA maintains hyperspherical geometry.
Abstract
Mixture-of-Experts (MoE) embedding models combine expert outputs using weighted linear summation, implicitly assuming a linear subspace structure in the embedding space. This assumption is shown to be inconsistent with the geometry of expert representations. Geometric analysis of a modern MoE embedding model reveals that expert outputs lie on a shared hyperspherical manifold characterized by tightly concentrated norms and substantial angular separation. Under this geometry, linear aggregation induces inward collapse toward the manifold interior, distorting vector magnitude and direction and reducing embedding comparability. To address this inconsistency, Spherical Barycentric Aggregation (SBA) is introduced as a geometry-preserving aggregation operator that separates radial and angular components to maintain hyperspherical structure while remaining fully compatible with existing routing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Advanced Graph Neural Networks
