Fingerprinting Search Keywords over HTTPS at Scale
Junhua Yan, Hasan Faik Alan, Jasleen Kaur

TL;DR
This paper investigates the feasibility of fingerprinting user search keywords over HTTPS traffic, analyzing various factors and conducting large-scale evaluations to understand the privacy risks involved.
Contribution
It provides a comprehensive analysis of keyword fingerprinting in HTTPS traffic, considering multiple variables and large datasets, which was previously underexplored.
Findings
Fingerprinting accuracy varies with client platform and search engine.
Certain feature sets enable effective keyword classification.
Large-scale data reveals significant privacy vulnerabilities in HTTPS search traffic.
Abstract
The possibility of fingerprinting the search keywords issued by a user on popular web search engines is a significant threat to user privacy. This threat has received surprisingly little attention in the network traffic analysis literature. In this work, we consider the problem of keyword fingerprinting of HTTPS traffic -- we study the impact of several factors, including client platform diversity, choice of search engine, feature sets as well as classification frameworks. We conduct both closed-world and open-world evaluations using nearly 4 million search queries collected over a period of three months. Our analysis reveals several insights into the threat of keyword fingerprinting in modern HTTPS traffic.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInternet Traffic Analysis and Secure E-voting · Spam and Phishing Detection · Privacy, Security, and Data Protection
