Semantic Identification Attacks on Web Browsing
Neel Guha

TL;DR
This paper presents a Semantic Identification Attack that leverages semantic signals from browsing sessions to re-identify users across sessions, highlighting privacy risks even with coarse semantic data.
Contribution
The paper introduces a novel attack method using semantic signals to re-identify users across browsing sessions, demonstrating its effectiveness with real-world data.
Findings
Semantic signals can effectively re-identify users across sessions.
Even coarse semantic information suffices for user identification.
Potential counter-measures can mitigate this privacy risk.
Abstract
We introduce a Semantic Identification Attack, in which an adversary uses semantic signals about the pages visited in one browsing session to identify other browsing sessions launched by the same user. This attack allows an adver- sary to determine if two browsing sessions originate from the same user regardless of any measures taken by the user to disguise their browser or network. We use the MSNBC Anonymous Browsing data set, which contains a large set of user visits (labeled by category) to implement such an attack and show that even very coarse semantic information is enough to identify users. We discuss potential counter- measures users can take to defend against this attack.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Internet Traffic Analysis and Secure E-voting · Spam and Phishing Detection
