HyperLogLog (HLL) Security: Inflating Cardinality Estimates
Pedro Reviriego, Pablo Adell, Daniel Ting

TL;DR
This paper investigates how attackers can manipulate HyperLogLog to produce inflated cardinality estimates, potentially causing false alarms or artificially increasing website traffic, and validates these vulnerabilities in real implementations.
Contribution
It reveals a new security vulnerability in HyperLogLog enabling attackers to inflate cardinality estimates, and evaluates protection strategies against such attacks.
Findings
Attackers can create small sets that produce any arbitrary large estimate.
Validated vulnerabilities in Presto and Redis HyperLogLog implementations.
Potential for misuse in inflating website visits or triggering false alarms.
Abstract
Counting the number of distinct elements on a set is needed in many applications, for example to track the number of unique users in Internet services or the number of distinct flows on a network. In many cases, an estimate rather than the exact value is sufficient and thus many algorithms for cardinality estimation that significantly reduce the memory and computation requirements have been proposed. Among them, Hyperloglog has been widely adopted in both software and hardware implementations. The security of Hyperloglog has been recently studied showing that an attacker can create a set of elements that produces a cardinality estimate that is much smaller than the real cardinality of the set. This set can be used for example to evade detection systems that use Hyperloglog. In this paper, the security of Hyperloglog is considered from the opposite angle: the attacker wants to create a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Advanced Malware Detection Techniques · Internet Traffic Analysis and Secure E-voting
