Sketch \star-metric: Comparing Data Streams via Sketching
Emmanuelle Anceaume (IRISA), Yann Busnel (LINA)

TL;DR
This paper introduces the Sketch ster-metric, a new method for efficiently estimating the distance between large data streams using sketches, preserving key properties and validated through extensive experiments.
Contribution
The paper proposes the Sketch ster-metric, a novel metric that compares data stream sketches while maintaining measure properties, enabling efficient and accurate stream analysis.
Findings
The Sketch ster-metric accurately estimates stream distances.
It preserves measure axioms like non-negativity and symmetry.
Experiments show robustness on synthetic and real data.
Abstract
In this paper, we consider the problem of estimating the distance between any two large data streams in small- space constraint. This problem is of utmost importance in data intensive monitoring applications where input streams are generated rapidly. These streams need to be processed on the fly and accurately to quickly determine any deviance from nominal behavior. We present a new metric, the Sketch \star-metric, which allows to define a distance between updatable summaries (or sketches) of large data streams. An important feature of the Sketch \star-metric is that, given a measure on the entire initial data streams, the Sketch \star-metric preserves the axioms of the latter measure on the sketch (such as the non-negativity, the identity, the symmetry, the triangle inequality but also specific properties of the f-divergence). Extensive experiments conducted on both synthetic traces…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Data Stream Mining Techniques · Advanced Database Systems and Queries
