Semantic Identification Attacks on Web Browsing

Neel Guha

arXiv:1610.09417·cs.CR·November 1, 2016

Semantic Identification Attacks on Web Browsing

Neel Guha

PDF

Open Access

TL;DR

This paper presents a Semantic Identification Attack that leverages semantic signals from browsing sessions to re-identify users across sessions, highlighting privacy risks even with coarse semantic data.

Contribution

The paper introduces a novel attack method using semantic signals to re-identify users across browsing sessions, demonstrating its effectiveness with real-world data.

Findings

01

Semantic signals can effectively re-identify users across sessions.

02

Even coarse semantic information suffices for user identification.

03

Potential counter-measures can mitigate this privacy risk.

Abstract

We introduce a Semantic Identification Attack, in which an adversary uses semantic signals about the pages visited in one browsing session to identify other browsing sessions launched by the same user. This attack allows an adver- sary to determine if two browsing sessions originate from the same user regardless of any measures taken by the user to disguise their browser or network. We use the MSNBC Anonymous Browsing data set, which contains a large set of user visits (labeled by category) to implement such an attack and show that even very coarse semantic information is enough to identify users. We discuss potential counter- measures users can take to defend against this attack.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Internet Traffic Analysis and Secure E-voting · Spam and Phishing Detection