Automating Governing Knowledge Commons and Contextual Integrity (GKC-CI) Privacy Policy Annotations with Large Language Models
Jake Chanenson, Madison Pickering, Noah Apthorpe

TL;DR
This paper shows that large language models can automatically annotate privacy policies with high accuracy, matching expert performance, thereby enabling scalable privacy analysis.
Contribution
It introduces a fine-tuned LLM approach for GKC-CI annotation, achieving expert-level accuracy and scaling privacy policy analysis.
Findings
Best model accuracy of 90.65%
Automatic annotation matches expert performance
Applied to 456 privacy policies for analysis
Abstract
Identifying contextual integrity (CI) and governing knowledge commons (GKC) parameters in privacy policy texts can facilitate normative privacy analysis. However, GKC-CI annotation has heretofore required manual or crowdsourced effort. This paper demonstrates that high-accuracy GKC-CI parameter annotation of privacy policies can be performed automatically using large language models. We fine-tune 50 open-source and proprietary models on 21,588 ground truth GKC-CI annotations from 16 privacy policies. Our best performing model has an accuracy of 90.65%, which is comparable to the accuracy of experts on the same task. We apply our best performing model to 456 privacy policies from a variety of online services, demonstrating the effectiveness of scaling GKC-CI annotation for privacy policy exploration and analysis. We publicly release our model training code, training and testing data, an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Privacy, Security, and Data Protection · Internet Traffic Analysis and Secure E-voting
MethodsMulti-Head Attention · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Linear Layer · Cosine Annealing · Linear Warmup With Cosine Annealing · Byte Pair Encoding · Dropout · Weight Decay · Softmax
