No Repetition: Fast Streaming with Highly Concentrated Hashing

Anders Aamand; Debarati Das; Evangelos Kipouridis; Jakob B. T.; Knudsen; Peter M. R. Rasmussen; Mikkel Thorup

arXiv:2004.01156·cs.DS·April 3, 2020·1 cites

No Repetition: Fast Streaming with Highly Concentrated Hashing

Anders Aamand, Debarati Das, Evangelos Kipouridis, Jakob B. T., Knudsen, Peter M. R. Rasmussen, Mikkel Thorup

PDF

Open Access

TL;DR

This paper demonstrates that using hash functions with strong concentration bounds allows for high-probability estimators in streaming algorithms without repetitions, leading to faster and simpler algorithms.

Contribution

It introduces a method to achieve high-probability bounds in streaming estimators using a single hash function with strong concentration, eliminating the need for multiple independent repetitions.

Findings

01

Single hash function achieves exponential error reduction

02

Algorithms become faster and simpler with strong concentration hashing

03

Suitable for online processing of high-volume data streams

Abstract

To get estimators that work within a certain error bound with high probability, a common strategy is to design one that works with constant probability, and then boost the probability using independent repetitions. Important examples of this approach are small space algorithms for estimating the number of distinct elements in a stream, or estimating the set similarity between large sets. Using standard strongly universal hashing to process each element, we get a sketch based estimator where the probability of a too large error is, say, 1/4. By performing $r$ independent repetitions and taking the median of the estimators, the error probability falls exponentially in $r$ . However, running $r$ independent experiments increases the processing time by a factor $r$ . Here we make the point that if we have a hash function with strong concentration bounds, then we get the same high…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Data Management and Algorithms · Data Stream Mining Techniques