Embedding Compression via Spherical Coordinates
Han Xiao

TL;DR
This paper introduces a novel embedding compression technique using spherical coordinates that achieves 1.5x compression with minimal loss, outperforming previous lossless methods across various data types.
Contribution
The method leverages spherical coordinate properties of high-dimensional vectors to enable efficient entropy coding, achieving superior compression with negligible reconstruction error.
Findings
Achieves 1.5x compression over prior methods.
Maintains zero measurable retrieval degradation.
Consistent performance across diverse embedding types.
Abstract
We present an -bounded compression method for unit-norm embeddings that achieves 1.5 compression, 25% better than the best prior lossless method. The method exploits that spherical coordinates of high-dimensional unit vectors concentrate around , causing IEEE 754 exponents to collapse to a single value and high-order mantissa bits to become predictable, enabling entropy coding of both. Reconstruction error is bounded by float32 machine epsilon (), making reconstructed values indistinguishable from originals at float32 precision. Evaluation across 26 configurations spanning text, image, and multi-vector embeddings confirms consistent compression improvement with zero measurable retrieval degradation on BEIR benchmarks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Algorithms and Data Compression · Video Coding and Compression Technologies
