# Fast Concurrent Data Sketches

**Authors:** Arik Rinberg, Alexander Spiegelman, Edward Bortnikov, Eshcar Hillel,, Idit Keidar, Lee Rhodes, and Hadar Serviansky

arXiv: 1902.10995 · 2019-12-06

## TL;DR

This paper introduces a generic, parallel algorithm for data sketches that enables multi-threaded creation and querying, maintaining accuracy and high scalability for processing large data streams in real-time.

## Contribution

It presents a novel parallelization approach for data sketches that bounds error and ensures correctness using relaxed semantics and strong linearisability.

## Key findings

- High scalability achieved in implementation
- Error remains small despite parallelism
- Correctness proven with strong linearisability

## Abstract

Data sketches are approximate succinct summaries of long streams. They are widely used for processing massive amounts of data and answering statistical queries about it in real-time. Existing libraries producing sketches are very fast, but do not allow parallelism for creating sketches using multiple threads or querying them while they are being built. We present a generic approach to parallelising data sketches efficiently, while bounding the error that such parallelism introduces. Utilising relaxed semantics and the notion of strong linearisability we prove our algorithm's correctness and analyse the error it induces in two specific sketches. Our implementation achieves high scalability while keeping the error small.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.10995/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/1902.10995/full.md

## References

27 references — full list in the complete paper: https://tomesphere.com/paper/1902.10995/full.md

---
Source: https://tomesphere.com/paper/1902.10995