Ultra-Quantisation: Efficient Embedding Search via 1.58-bit Encodings

Richard Connor; Alan Dearle; Ben Claydon

arXiv:2506.00528·cs.LG·June 3, 2025

Ultra-Quantisation: Efficient Embedding Search via 1.58-bit Encodings

Richard Connor, Alan Dearle, Ben Claydon

PDF

Open Access

TL;DR

This paper introduces Ultra-Quantisation, a method that replaces high-dimensional neural embeddings with extremely compact {-1,0,1} vectors, drastically reducing size and computation while preserving similarity accuracy.

Contribution

It presents a novel quantisation technique using convex polytopes to convert embeddings into ultra-compact {-1,0,1} vectors with minimal loss of similarity information.

Findings

01

Significant reduction in storage and computation costs.

02

High correlation maintained in similarity measurements.

03

Effective quantisation using convex polytopes in high-dimensional space.

Abstract

Many modern search domains comprise high-dimensional vectors of floating point numbers derived from neural networks, in the form of embeddings. Typical embeddings range in size from hundreds to thousands of dimensions, making the size of the embeddings, and the speed of comparison, a significant issue. Quantisation is a class of mechanism which replaces the floating point values with a smaller representation, for example a short integer. This gives an approximation of the embedding space in return for a smaller data representation and a faster comparison function. Here we take this idea almost to its extreme: we show how vectors of arbitrary-precision floating point values can be replaced by vectors whose elements are drawn from the set {-1,0,1}. This yields very significant savings in space and metric evaluation cost, while maintaining a strong correlation for similarity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques · Error Correcting Code Techniques · Blind Source Separation Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Sparse Evolutionary Training