Beyond the Geometric Curse: High-Dimensional N-Gram Hashing for Dense Retrieval

Sangeet Sharma

arXiv:2601.15205·cs.IR·January 22, 2026

Beyond the Geometric Curse: High-Dimensional N-Gram Hashing for Dense Retrieval

Sangeet Sharma

PDF

Open Access

TL;DR

This paper introduces NUMEN, a training-free, high-dimensional hashing method for dense retrieval that surpasses traditional sparse methods like BM25 by removing the dimensionality bottleneck.

Contribution

NUMEN demonstrates that eliminating training and using deterministic hashing enables dense retrieval models to outperform sparse baselines.

Findings

01

NUMEN achieves 93.90% Recall@100 at 32,768 dimensions.

02

NUMEN surpasses the sparse BM25 baseline of 93.6%.

03

Removing the embedding bottleneck improves dense retrieval performance.

Abstract

Why do even the most powerful 7B-parameter embedding models struggle with simple retrieval tasks that the decades old BM25 handles with ease? Recent theory suggests that this happens because of a dimensionality bottleneck. This occurs when we force infinite linguistic nuances into small, fixed-length learned vectors. We developed NUMEN to break this bottleneck by removing the learning process entirely. Instead of training heavy layers to map text to a constrained space, NUMEN uses deterministic character hashing to project language directly onto high-dimensional vectors. This approach requires no training, supports an unlimited vocabulary, and allows the geometric capacity scale as needed. On the LIMIT benchmark, NUMEN achieves 93.90 % Recall@100 at 32,768 dimensions. This makes it the first dense retrieval model to officially surpass the sparse BM25 baseline 93.6 %. Our findings show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Face recognition and analysis · Advanced Neural Network Applications