Improving Search Suggestions for Alphanumeric Queries
Samarth Agrawal, Jayanth Yetukuri, Diptesh Kanojia, Qunzhi Zhou, Zhe Wu

TL;DR
This paper introduces a character-level, training-free retrieval method for alphanumeric search queries that improves efficiency and accuracy in e-commerce search suggestions, outperforming traditional models.
Contribution
The authors present a novel fixed-length binary vector encoding for alphanumeric identifiers, enabling fast similarity search with optional re-ranking, suitable for production environments.
Findings
Significant business metric improvements in A/B testing
Efficient retrieval via Hamming distance over large corpora
Enhanced precision with optional edit distance re-ranking
Abstract
Alphanumeric identifiers such as manufacturer part numbers (MPNs), SKUs, and model codes are ubiquitous in e-commerce catalogs and search. These identifiers are sparse, non linguistic, and highly sensitive to tokenization and typographical variation, rendering conventional lexical and embedding based retrieval methods ineffective. We propose a training free, character level retrieval framework that encodes each alphanumeric sequence as a fixed length binary vector. This representation enables efficient similarity computation via Hamming distance and supports nearest neighbor retrieval over large identifier corpora. An optional re-ranking stage using edit distance refines precision while preserving latency guarantees. The method offers a practical and interpretable alternative to learned dense retrieval models, making it suitable for production deployment in search suggestion generation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
