TL;DR
This paper demonstrates that user behavioral histograms can serve as unique fingerprints, enabling high-accuracy user identification across various datasets, and evaluates factors influencing identification success and privacy defenses.
Contribution
It evaluates the effectiveness of an optimal user identification algorithm on multiple datasets, highlighting the potential for re-identification using behavioral histograms and analyzing privacy measures.
Findings
High identification accuracy using behavioral histograms across datasets
Simultaneous identification outperforms one-by-one matching
Factors like data duration and resolution significantly impact accuracy
Abstract
Most users of online services have unique behavioral or usage patterns. These behavioral patterns can be exploited to identify and track users by using only the observed patterns in the behavior. We study the task of identifying users from statistics of their behavioral patterns. Specifically, we focus on the setting in which we are given histograms of users' data collected during two different experiments. We assume that, in the first dataset, the users' identities are anonymized or hidden and that, in the second dataset, their identities are known. We study the task of identifying the users by matching the histograms of their data in the first dataset with the histograms from the second dataset. In recent works, the optimal algorithm for this user identification task is introduced. In this paper, we evaluate the effectiveness of this method on three different types of datasets and in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
"Anonymous" Location Data Problems - Computerphile· youtube
