Demystifying Legalese: An Automated Approach for Summarizing and Analyzing Overlaps in Privacy Policies and Terms of Service
Shikha Soneji, Mitchell Hoesing, Sujay Koujalgi, Jonathan Dodge

TL;DR
This paper develops and evaluates language models, especially RoBERTa, to automatically summarize and analyze privacy policies and terms of service, improving user understanding and GDPR compliance detection.
Contribution
It introduces an automated approach using transformer models to summarize legal documents and identify overlaps, highlighting redundancies and compliance issues.
Findings
RoBERTa achieved a 0.74 F1-score in summarization tasks.
The model effectively identified redundancies and potential violations in privacy policies.
The approach enhances transparency and compliance in legal document analysis.
Abstract
The complexities of legalese in terms and policy documents can bind individuals to contracts they do not fully comprehend, potentially leading to uninformed data sharing. Our work seeks to alleviate this issue by developing language models that provide automated, accessible summaries and scores for such documents, aiming to enhance user understanding and facilitate informed decisions. We compared transformer-based and conventional models during training on our dataset, and RoBERTa performed better overall with a remarkable 0.74 F1-score. Leveraging our best-performing model, RoBERTa, we highlighted redundancies and potential guideline violations by identifying overlaps in GDPR-required documents, underscoring the necessity for stricter GDPR compliance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy, Security, and Data Protection
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Weight Decay · Dense Connections · Residual Connection · Softmax · Adam · Linear Warmup With Linear Decay · Layer Normalization · Attention Dropout
