Sampling Methods for Inner Product Sketching
Majid Daliri, Juliana Freire, Christopher Musco, A\'ecio Santos,, Haoxiang Zhang

TL;DR
This paper introduces two efficient sampling-based sketching methods for inner product estimation that outperform traditional linear sketching techniques in both theory and practice, especially for correlation estimation tasks.
Contribution
The authors develop and analyze two new sampling methods that are faster and more accurate than existing linear sketching approaches for inner product estimation.
Findings
Our methods run in linear time for sketch computation.
They outperform linear sketching methods on various tasks.
State-of-the-art results achieved for correlation estimation in unjoined tables.
Abstract
Recently, Bessa et al. (PODS 2023) showed that sketches based on coordinated weighted sampling theoretically and empirically outperform popular linear sketching methods like Johnson-Lindentrauss projection and CountSketch for the ubiquitous problem of inner product estimation. We further develop this finding by introducing and analyzing two alternative sampling-based methods. In contrast to the computationally expensive algorithm in Bessa et al., our methods run in linear time (to compute the sketch) and perform better in practice, significantly beating linear sketching on a variety of tasks. For example, they provide state-of-the-art results for estimating the correlation between columns in unjoined tables, a problem that we show how to reduce to inner product estimation in a black-box way. While based on known sampling techniques (threshold and priority sampling) we introduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Data Visualization and Analytics · Data Management and Algorithms
