Random Cycle Coding: Lossless Compression of Cluster Assignments via Bits-Back Coding
Daniel Severo, Ashish Khisti, Alireza Makhzani

TL;DR
Random Cycle Coding (RCC) offers an optimal, training-free lossless compression method for cluster assignments, outperforming previous techniques in efficiency and resource usage, with significant practical benefits for vector database systems.
Contribution
RCC introduces a novel permutation-based encoding method that is optimal, scalable, and does not require training, improving compression efficiency for cluster assignments.
Findings
RCC outperforms previous methods in bit rate efficiency.
RCC reduces compute and memory resources needed.
Application to vector databases saves up to 70% in storage costs.
Abstract
We present an optimal method for encoding cluster assignments of arbitrary data sets. Our method, Random Cycle Coding (RCC), encodes data sequentially and sends assignment information as cycles of the permutation defined by the order of encoded elements. RCC does not require any training and its worst-case complexity scales quasi-linearly with the size of the largest cluster. We characterize the achievable bit rates as a function of cluster sizes and number of elements, showing RCC consistently outperforms previous methods while requiring less compute and memory resources. Experiments show RCC can save up to 2 bytes per element when applied to vector databases, and removes the need for assigning integer ids to identify vectors, translating to savings of up to 70% in vector database systems for similarity search applications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Cellular Automata and Applications · Algorithms and Data Compression
