No Repetition: Fast Streaming with Highly Concentrated Hashing
Anders Aamand, Debarati Das, Evangelos Kipouridis, Jakob B. T., Knudsen, Peter M. R. Rasmussen, Mikkel Thorup

TL;DR
This paper demonstrates that using hash functions with strong concentration bounds allows for high-probability estimators in streaming algorithms without repetitions, leading to faster and simpler algorithms.
Contribution
It introduces a method to achieve high-probability bounds in streaming estimators using a single hash function with strong concentration, eliminating the need for multiple independent repetitions.
Findings
Single hash function achieves exponential error reduction
Algorithms become faster and simpler with strong concentration hashing
Suitable for online processing of high-volume data streams
Abstract
To get estimators that work within a certain error bound with high probability, a common strategy is to design one that works with constant probability, and then boost the probability using independent repetitions. Important examples of this approach are small space algorithms for estimating the number of distinct elements in a stream, or estimating the set similarity between large sets. Using standard strongly universal hashing to process each element, we get a sketch based estimator where the probability of a too large error is, say, 1/4. By performing independent repetitions and taking the median of the estimators, the error probability falls exponentially in . However, running independent experiments increases the processing time by a factor . Here we make the point that if we have a hash function with strong concentration bounds, then we get the same high…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Data Management and Algorithms · Data Stream Mining Techniques
