Beyond The Text: Analysis of Privacy Statements through Syntactic and Semantic Role Labeling
Yan Shvartzshnaider, Ananth Balashankar, Vikas Patidar, Thomas Wies,, Lakshminarayanan Subramanian

TL;DR
This paper introduces a new NLP task for extracting privacy parameters from policies using syntactic and semantic role labeling, showing that combining dependency parsing with semantic role labeling yields the best accuracy, especially when domain knowledge is incorporated.
Contribution
It formulates privacy parameter extraction as a novel NLP task and demonstrates that combining dependency parsing with semantic role labeling improves accuracy over traditional methods.
Findings
Combining dependency parsing with semantic role labeling achieves highest accuracy.
Traditional NLP methods like HMMs and BERT fine-tuning are less effective.
Domain-specific knowledge significantly enhances extraction precision and recall.
Abstract
This paper formulates a new task of extracting privacy parameters from a privacy policy, through the lens of Contextual Integrity, an established social theory framework for reasoning about privacy norms. Privacy policies, written by lawyers, are lengthy and often comprise incomplete and vague statements. In this paper, we show that traditional NLP tasks, including the recently proposed Question-Answering based solutions, are insufficient to address the privacy parameter extraction problem and provide poor precision and recall. We describe 4 different types of conventional methods that can be partially adapted to address the parameter extraction task with varying degrees of success: Hidden Markov Models, BERT fine-tuned models, Dependency Type Parsing (DP) and Semantic Role Labeling (SRL). Based on a detailed evaluation across 36 real-world privacy policies of major enterprises, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy, Security, and Data Protection · Privacy-Preserving Technologies in Data · Hate Speech and Cyberbullying Detection
MethodsLinear Layer · Adam · Softmax · Residual Connection · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Layer Normalization · WordPiece · Multi-Head Attention · Weight Decay
