Practical Data-Dependent Metric Compression with Provable Guarantees
Piotr Indyk, Ilya Razenshteyn, Tal Wagner

TL;DR
This paper presents a simple, provably guaranteed method for compressing high-dimensional data points into compact representations that preserve distances within a small error, outperforming or matching existing heuristic methods.
Contribution
It introduces a new data-dependent metric compression algorithm with provable guarantees that simplifies previous approaches and achieves comparable or better performance than state-of-the-art heuristics.
Findings
Our method achieves comparable or better accuracy than Product Quantization.
The representation size is significantly reduced with provable distance preservation.
The algorithm is simpler and nearly matches the best known bounds.
Abstract
We introduce a new distance-preserving compact representation of multi-dimensional point-sets. Given points in a -dimensional space where each coordinate is represented using bits (i.e., bits per point), it produces a representation of size bits per point from which one can approximate the distances up to a factor of . Our algorithm almost matches the recent bound of~\cite{indyk2017near} while being much simpler. We compare our algorithm to Product Quantization (PQ)~\cite{jegou2011product}, a state of the art heuristic metric compression method. We evaluate both algorithms on several data sets: SIFT (used in \cite{jegou2011product}), MNIST~\cite{lecun1998mnist}, New York City taxi time series~\cite{guha2016robust} and a synthetic one-dimensional data set embedded in a high-dimensional space. With appropriately tuned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Advanced Data Compression Techniques · Data Management and Algorithms
