Training Sparse Mixture Of Experts Text Embedding Models
Zach Nussbaum, Brandon Duderstadt

TL;DR
This paper introduces Nomic Embed v2, a sparse mixture of experts text embedding model that achieves high performance with reduced inference latency and memory usage, addressing deployment challenges of large models.
Contribution
It presents the first general-purpose MoE text embedding model that outperforms similar-sized models and is open-sourced for reproducibility.
Findings
Outperforms same-parameter models on benchmarks
Maintains competitive performance with larger models
Reduces inference latency and memory usage
Abstract
Transformer-based text embedding models have improved their performance on benchmarks like MIRACL and BEIR by increasing their parameter counts. However, this scaling approach introduces significant deployment challenges, including increased inference latency and memory usage. These challenges are particularly severe in retrieval-augmented generation (RAG) applications, where large models' increased memory requirements constrain dataset ingestion capacity, and their higher latency directly impacts query-time performance. While causal language models have addressed similar efficiency challenges using Mixture of Experts (MoE) architectures, this approach hasn't been successfully adapted to the general text embedding setting. In this paper, we introduce Nomic Embed v2, the first general purpose MoE text embedding model. Our model outperforms models in the same parameter class on both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Expert finding and Q&A systems · Computational and Text Analysis Methods
MethodsMixture of Experts
