Understanding Privacy Norms through Web Forms
Hao Cui, Rahmadi Trimananda, Athina Markopoulou

TL;DR
This study analyzes web forms across 11,500 websites to uncover embedded privacy norms, revealing patterns driven by function and law, and highlighting discrepancies between privacy policies and actual data collection practices.
Contribution
It introduces a scalable method to annotate and analyze web forms for privacy norms, combining web crawling, text classification, and large language models.
Findings
Web forms follow privacy norms driven by function and legal needs.
Deviations from norms often indicate unnecessary data collection.
Privacy policies frequently do not align with observed data practices.
Abstract
Web forms are one of the primary ways to collect personal information online, yet they are relatively under-studied. Unlike web tracking, data collection through web forms is explicit and contextualized. Users (i) are asked to input specific personal information types, and (ii) know the specific context (i.e., on which website and for what purpose). For web forms to be trusted by users, they must meet the common sense standards of appropriate data collection practices within a particular context (i.e., privacy norms). In this paper, we extract the privacy norms embedded within web forms through a measurement study. First, we build a specialized crawler to discover web forms on websites. We run it on 11,500 popular websites, and we create a dataset of 293K web forms. Second, to process data of this scale, we develop a cost-efficient way to annotate web forms with form types and personal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Privacy, Security, and Data Protection
