Better size estimation for sparse matrix products
Rasmus Resen Amossen, Andrea Campagna, Rasmus Pagh

TL;DR
This paper introduces a new, faster method for estimating the number of non-zero entries in sparse boolean matrix products, improving efficiency over previous algorithms and demonstrating practical accuracy through experiments.
Contribution
It presents a novel size estimation algorithm combining hash functions and sampling, achieving expected linear time and matching space lower bounds, with practical improvements over prior methods.
Findings
Expected time O(n) for approximation with small error probability
Significantly better accuracy in experiments on real data
Matching space lower bounds for size estimation
Abstract
We consider the problem of doing fast and reliable estimation of the number of non-zero entries in a sparse boolean matrix product. This problem has applications in databases and computer algebra. Let n denote the total number of non-zero entries in the input matrices. We show how to compute a 1 +- epsilon approximation (with small probability of error) in expected time O(n) for any epsilon > 4/\sqrt[4]{z}. The previously best estimation algorithm, due to Cohen (JCSS 1997), uses time O(n/epsilon^2). We also present a variant using O(sort(n)) I/Os in expectation in the cache-oblivious model. In contrast to these results, the currently best algorithms for computing a sparse boolean matrix product use time omega(n^{4/3}) (resp. omega(n^{4/3}/B) I/Os), even if the result matrix has only z=O(n) nonzero entries. Our algorithm combines the size estimation technique of Bar-Yossef et al. (RANDOM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Machine Learning and Algorithms · Optimization and Search Problems
