Coordinate Heterogeneity Governs Binary Quantization: From InfoNCE to Recall
Wenxuan Xiao

TL;DR
This paper develops a theoretical framework linking coordinate heterogeneity to binary quantization performance, explaining when different strategies are effective and guiding system design.
Contribution
It introduces a complete analytical model connecting Gaussian structure to BQ quality, revealing how heterogeneity influences fidelity and system strategies.
Findings
Coordinate heterogeneity governs BQ performance.
Random rotation destroys useful signal in BQ.
A scaling law predicts BQ fidelity across models.
Abstract
Binary quantization (BQ) compresses high-dimensional embeddings into one or two bits per coordinate, enabling nearest neighbor search at extreme speed. Yet a striking puzzle persists: BQ achieves competitive recall on contrastive embeddings but fails on others -- and two leading systems adopt diametrically opposite strategies (random rotation vs. preserving coordinate axes) without a common theory explaining when each is appropriate. We resolve this puzzle by connecting the Gaussian structure recently established for InfoNCE-trained representations to a complete analytical framework for BQ quality. The key insight is that coordinate heterogeneity -- the non-uniformity of per-coordinate variances -- governs the key aspects of BQ performance. We derive closed-form expressions for ranking fidelity, prove that the magnitude bit carries information proportional to heterogeneity, and show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
