SwiftEmbed: Ultra-Fast Text Embeddings via Static Token Lookup for Real-Time Applications
Edouard Lansiaux, Antoine Simonet, Eric Wiel

TL;DR
SwiftEmbed is a high-performance static token embedding system designed for real-time applications, achieving sub-2ms latency and high accuracy across multiple NLP tasks, especially excelling in duplicate detection and semantic similarity.
Contribution
The paper introduces SwiftEmbed, a Rust-based, ultra-fast static embedding lookup system that significantly reduces latency while maintaining competitive accuracy across diverse NLP tasks.
Findings
Achieves 1.12 ms p50 latency for single-text requests.
Maintains a 60.6 MTEB average score across 8 tasks.
Demonstrates strong performance in duplicate detection and semantic similarity.
Abstract
We present SwiftEmbed, a production-oriented serving system for static token embeddings that achieves 1.12\,ms p50 latency for single-text requests while maintaining a 60.6 MTEB average score across 8 representative tasks. Built around the open-source Potion-base-8M distilled model from MinishLab and implemented in Rust, the system delivers 50,000 requests per second through static embedding lookup, mean pooling, and zero-copy IEEE754 binary serialization. Evaluation demonstrates exceptional duplicate detection performance (90.1% AP) and strong semantic similarity (76.1% Spearman correlation). Performance relative to Sentence-BERT is task-dependent: robust for deduplication and similarity workloads (89--100%), substantially lower for classification and complex retrieval tasks (75%). Domain-specific performance ranges from 75% to 131% of a GloVe-840B baseline. The system targets…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
