A Measurement of Genuine Tor Traces for Realistic Website Fingerprinting
Rob Jansen, Ryan Wails, Aaron Johnson

TL;DR
This paper introduces GTT23, the first large-scale, real-world dataset of genuine Tor traces for website fingerprinting research, addressing the limitations of synthetic datasets and enabling more accurate evaluation of WF attacks.
Contribution
The paper provides GTT23, a comprehensive real Tor trace dataset, and compares it with existing synthetic datasets, highlighting their deficiencies for realistic WF attack evaluation.
Findings
GTT23 is larger and more realistic than previous datasets.
Synthetic datasets often misrepresent real-world Tor user behavior.
GTT23 enables more accurate evaluation of WF attack effectiveness.
Abstract
Website fingerprinting (WF) is a dangerous attack on web privacy because it enables an adversary to predict the website a user is visiting, despite the use of encryption, VPNs, or anonymizing networks such as Tor. Previous WF work almost exclusively uses synthetic datasets to evaluate the performance and estimate the feasibility of WF attacks despite evidence that synthetic data misrepresents the real world. In this paper we present GTT23, the first WF dataset of genuine Tor traces, which we obtain through a large-scale measurement of the Tor network and which is intended especially for WF. It represents real Tor user behavior better than any existing WF dataset, is larger than any existing WF dataset by at least an order of magnitude, and will help ground the future study of realistic WF attacks and defenses. In a detailed evaluation, we survey 28 WF datasets published since 2008 and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInternet Traffic Analysis and Secure E-voting · Privacy-Preserving Technologies in Data · Hate Speech and Cyberbullying Detection
