Per-Flow Cardinality Estimation Based On Virtual LogLog Sketching
Zeyu Zhou

TL;DR
This paper explores improvements to the virtual LogLog algorithm for estimating the number of distinct elements in data flows, proposing a generalized estimator family and a maximum-likelihood approach, demonstrating near-optimal performance.
Contribution
It introduces a generalized family of estimators for the virtual LogLog algorithm and derives a maximum-likelihood estimator, enhancing estimation accuracy and understanding.
Findings
vHLL estimator is near-optimal for per-flow estimation
The generalized estimator family provides performance insights
Maximum-likelihood estimator offers an alternative approach
Abstract
Flow cardinality estimation is the problem of estimating the number of distinct elements in a data flow, often with a stringent memory constraint. It has wide applications in network traffic measurement and in database systems. The virtual LogLog algorithm proposed recently by Xiao, Chen, Chen and Ling estimates the cardinalities of a large number of flows with a compact memory. The purpose of this thesis is to explore two new perspectives on the estimation process of this algorithm. Firstly, we propose and investigate a family of estimators that generalizes the original vHLL estimator and evaluate the performance of the vHLL estimator compared to other estimators in this family. Secondly, we propose an alternative solution to the estimation problem by deriving a maximum-likelihood estimator. Empirical evidence from both perspectives suggests the near-optimality of the vHLL estimator…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Network Security and Intrusion Detection · Internet Traffic Analysis and Secure E-voting
