Large Language Models: A New Approach for Privacy Policy Analysis at Scale
David Rodriguez, Ian Yang, Jose M. Del Alamo, Norman Sadeh

TL;DR
This paper demonstrates that Large Language Models like ChatGPT and Llama 2 can effectively automate privacy policy analysis at scale, outperforming traditional NLP methods in accuracy, cost, and efficiency.
Contribution
It introduces a novel application of LLMs for privacy policy analysis, providing guidance on prompt design and validating performance with benchmark datasets.
Findings
F1 score exceeds 93% on benchmark datasets
Reduces costs and processing times compared to traditional methods
Requires less technical expertise for implementation
Abstract
The number and dynamic nature of web and mobile applications presents significant challenges for assessing their compliance with data protection laws. In this context, symbolic and statistical Natural Language Processing (NLP) techniques have been employed for the automated analysis of these systems' privacy policies. However, these techniques typically require labor-intensive and potentially error-prone manually annotated datasets for training and validation. This research proposes the application of Large Language Models (LLMs) as an alternative for effectively and efficiently extracting privacy practices from privacy policies at scale. Particularly, we leverage well-known LLMs such as ChatGPT and Llama 2, and offer guidance on the optimal design of prompts, parameters, and models, incorporating advanced strategies such as few-shot learning. We further illustrate its capability to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy, Security, and Data Protection · Privacy-Preserving Technologies in Data
MethodsLLaMA
