Towards matching user mobility traces in large-scale datasets
D\'aniel Kondor, Behrooz Hashemian, Yves-Alexandre de Montjoye, Carlo, Ratti

TL;DR
This paper analyzes the large-scale reidentifiability of user mobility traces from two datasets, revealing that matchability improves with longer observation periods and higher data collection frequency, raising privacy concerns.
Contribution
It provides the first large-scale analysis of user matchability in real mobility datasets, quantifying factors influencing reidentification success over time.
Findings
Matching success reaches 16.8% after one week
Over 55% matchability after four weeks
Higher data collection frequency increases matchability
Abstract
The problem of unicity and reidentifiability of records in large-scale databases has been studied in different contexts and approaches, with focus on preserving privacy or matching records from different data sources. With an increasing number of service providers nowadays routinely collecting location traces of their users on unprecedented scales, there is a pronounced interest in the possibility of matching records and datasets based on spatial trajectories. Extending previous work on reidentifiability of spatial data and trajectory matching, we present the first large-scale analysis of user matchability in real mobility datasets on realistic scales, i.e. among two datasets that consist of several million people's mobility traces, coming from a mobile network operator and transportation smart card usage. We extract the relevant statistical properties which influence the matching…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
