APPCorp: A Corpus for Android Privacy Policy Document Structure Analysis
Shuang Liu, Renjie Guo, Baiyang Zhao, Tao Chen, Meishan, Zhang

TL;DR
This paper introduces APPCorp, a manually annotated corpus of 167 Android privacy policies, to facilitate automatic document structure analysis and improve user understanding of privacy terms.
Contribution
It provides a new, publicly available annotated corpus and benchmarks multiple classification models for privacy policy document analysis.
Findings
Benchmark results show varying effectiveness of models.
Analysis highlights key challenges in privacy policy classification.
The corpus enables future research in privacy policy understanding.
Abstract
With the increasing popularity of mobile devices and the wide adoption of mobile Apps, an increasing concern of privacy issues is raised. Privacy policy is identified as a proper medium to indicate the legal terms, such as GDPR, and to bind legal agreement between service providers and users. However, privacy policies are usually long and vague for end users to read and understand. It is thus important to be able to automatically analyze the document structures of privacy policies to assist user understanding. In this work we create a manually labelled corpus containing privacy policies (of more than K words and annotated paragraphs). We report the annotation process and details of the annotated corpus. We also benchmark our data corpus with document classification models, thoroughly analyze the results and discuss challenges and opportunities for the research…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy, Security, and Data Protection · Advanced Malware Detection Techniques · Mobile Health and mHealth Applications
