Compressive Mining: Fast and Optimal Data Mining in the Compressed Domain
Michail Vlachos, Nikolaos Freris, Anastasios Kyrillidis

TL;DR
This paper introduces a method for accurately estimating distances directly in the compressed domain, enabling more precise data mining operations like clustering and search without full data reconstruction.
Contribution
It formulates and solves an optimization problem for tightest distance bounds between compressed data objects, improving accuracy over existing techniques.
Findings
Tighter distance bounds lead to more accurate k-NN and clustering.
The proposed method outperforms PCA and random projections in certain sparse data scenarios.
An exact, fast algorithm for optimal distance estimation is developed.
Abstract
Real-world data typically contain repeated and periodic patterns. This suggests that they can be effectively represented and compressed using only a few coefficients of an appropriate basis (e.g., Fourier, Wavelets, etc.). However, distance estimation when the data are represented using different sets of coefficients is still a largely unexplored area. This work studies the optimization problems related to obtaining the \emph{tightest} lower/upper bound on Euclidean distances when each data object is potentially compressed using a different set of orthonormal coefficients. Our technique leads to tighter distance estimates, which translates into more accurate search, learning and mining operations \textit{directly} in the compressed domain. We formulate the problem of estimating lower/upper distance bounds as an optimization problem. We establish the properties of optimal solutions,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Sparse and Compressive Sensing Techniques · Algorithms and Data Compression
Methodsk-Nearest Neighbors
