Exposing the Hidden Web: An Analysis of Third-Party HTTP Requests on 1 Million Websites
Timothy Libert

TL;DR
This study analyzes privacy risks on 1 million popular websites, revealing widespread third-party data leaks, cookie usage, external scripts, and potential NSA vulnerabilities, highlighting significant privacy concerns online.
Contribution
It provides a large-scale quantitative analysis of third-party requests and data leaks, exposing the extent of privacy compromises on popular websites.
Findings
Nearly 90% of websites leak user data to unknown parties
Over 60% of websites set third-party cookies
More than 80% load external JavaScript code
Abstract
This article provides a quantitative analysis of privacy-compromising mechanisms on 1 million popular websites. Findings indicate that nearly 9 in 10 websites leak user data to parties of which the user is likely unaware; more than 6 in 10 websites spawn third- party cookies; and more than 8 in 10 websites load Javascript code from external parties onto users' computers. Sites that leak user data contact an average of nine external domains, indicating that users may be tracked by multiple entities in tandem. By tracing the unintended disclosure of personal browsing histories on the Web, it is revealed that a handful of U.S. companies receive the vast bulk of user data. Finally, roughly 1 in 5 websites are potentially vulnerable to known National Security Agency spying techniques at the time of analysis.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInternet Traffic Analysis and Secure E-voting · Advanced Malware Detection Techniques · Spam and Phishing Detection
