SwiftEmbed: Ultra-Fast Text Embeddings via Static Token Lookup for Real-Time Applications

Edouard Lansiaux; Antoine Simonet; Eric Wiel

arXiv:2510.24793·cs.CL·March 10, 2026

SwiftEmbed: Ultra-Fast Text Embeddings via Static Token Lookup for Real-Time Applications

Edouard Lansiaux, Antoine Simonet, Eric Wiel

PDF

TL;DR

SwiftEmbed is a high-performance static token embedding system designed for real-time applications, achieving sub-2ms latency and high accuracy across multiple NLP tasks, especially excelling in duplicate detection and semantic similarity.

Contribution

The paper introduces SwiftEmbed, a Rust-based, ultra-fast static embedding lookup system that significantly reduces latency while maintaining competitive accuracy across diverse NLP tasks.

Findings

01

Achieves 1.12 ms p50 latency for single-text requests.

02

Maintains a 60.6 MTEB average score across 8 tasks.

03

Demonstrates strong performance in duplicate detection and semantic similarity.

Abstract

We present SwiftEmbed, a production-oriented serving system for static token embeddings that achieves 1.12\,ms p50 latency for single-text requests while maintaining a 60.6 MTEB average score across 8 representative tasks. Built around the open-source Potion-base-8M distilled model from MinishLab and implemented in Rust, the system delivers 50,000 requests per second through static embedding lookup, mean pooling, and zero-copy IEEE754 binary serialization. Evaluation demonstrates exceptional duplicate detection performance (90.1% AP) and strong semantic similarity (76.1% Spearman correlation). Performance relative to Sentence-BERT is task-dependent: robust for deduplication and similarity workloads (89--100%), substantially lower for classification and complex retrieval tasks (75%). Domain-specific performance ranges from 75% to 131% of a GloVe-840B baseline. The system targets…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.