Compression for Quadratic Similarity Queries
Amir Ingber, Thomas Courtade, Tsachy Weissman

TL;DR
This paper investigates the fundamental limits of performing quadratic similarity queries on compressed data, establishing thresholds for reliable responses and characterizing the exponential reliability achievable for Gaussian sources and beyond.
Contribution
It provides an explicit characterization of the identification rate for Gaussian sources and introduces a robust scheme that attains maximal compression rates for any source.
Findings
Queries can be answered reliably if compression rate exceeds the identification rate.
Exponential reliability of query responses is achievable above the identification rate.
Gaussian sources require the largest compression rate among sources with the same variance.
Abstract
The problem of performing similarity queries on compressed data is considered. We focus on the quadratic similarity measure, and study the fundamental tradeoff between compression rate, sequence length, and reliability of queries performed on compressed data. For a Gaussian source, we show that queries can be answered reliably if and only if the compression rate exceeds a given threshold - the identification rate - which we explicitly characterize. Moreover, when compression is performed at a rate greater than the identification rate, responses to queries on the compressed data can be made exponentially reliable. We give a complete characterization of this exponent, which is analogous to the error and excess-distortion exponents in channel and source coding, respectively. For a general source we prove that, as with classical compression, the Gaussian source requires the largest…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
