A Public and Reproducible Assessment of the Topics API on Real Data
Yohan Beugin, Patrick McDaniel

TL;DR
This study provides a transparent, reproducible evaluation of Google's Topics API using a large, real dataset, revealing privacy risks and the variability of user privacy protection over time.
Contribution
It offers the first public, reproducible assessment of the Topics API on real browsing data, highlighting privacy vulnerabilities and encouraging open evaluation practices.
Findings
Re-identification probability is 2-4% over multiple observations.
Topics API's privacy guarantees vary among users.
Information leakage increases with more data exposure.
Abstract
The Topics API for the web is Google's privacy-enhancing alternative to replace third-party cookies. Results of prior work have led to an ongoing discussion between Google and research communities about the capability of Topics to trade off both utility and privacy. The central point of contention is largely around the realism of the datasets used in these analyses and their reproducibility; researchers using data collected on a small sample of users or generating synthetic datasets, while Google's results are inferred from a private dataset. In this paper, we complement prior research by performing a reproducible assessment of the latest version of the Topics API on the largest and publicly available dataset of real browsing histories. First, we measure how unique and stable real users' interests are over time. Then, we evaluate if Topics can be used to fingerprint the users from these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management
