Testing frequency distributions in a stream
Claire Mathieu, Michel de Rougemont

TL;DR
This paper introduces methods to verify if a data stream's frequency distribution matches a target distribution using a new distance measure, with efficient algorithms for certain cases and space lower bounds.
Contribution
It proposes the relative Fréchet distance for comparing distributions and develops streaming algorithms for testing distribution closeness under different models.
Findings
Space complexity is Omega(n) for uniform distributions.
Efficient algorithms with O(log^2 n * log log n) space for rapidly decreasing distributions.
The approach combines the Spacesaving algorithm with stream sampling.
Abstract
We study how to verify specific frequency distributions when we observe a stream of data items taken from a universe of distinct items. We introduce the \emph{relative Fr\'echet distance} to compare two frequency functions in a homogeneous manner. We consider two streaming models: insertions only and sliding windows. We present a Tester for a certain class of functions, which decides if is close to or if is far from with high probability, when is given and is defined by a stream. If is uniform we show a space lower bound. If decreases fast enough, we then only use space . The analysis relies on the Spacesaving algorithm \cite{MAE2005,Z22} and on sampling the stream.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Algorithms and Data Compression · Advanced Database Systems and Queries
