A Fine-grained Chinese Software Privacy Policy Dataset for Sequence Labeling and Regulation Compliant Identification
Kaifa Zhao, Le Yu, Shiyao Zhou, Jing Li, Xiapu Luo, Yat Fei Aemon, Chiu, Yutong Liu

TL;DR
This paper introduces CA4P-483, the first Chinese privacy policy dataset with detailed annotations, enabling NLP analysis for understanding and regulating software privacy policies in Chinese applications.
Contribution
It creates a comprehensive Chinese privacy policy dataset with fine-grained annotations, addressing language and legal requirement gaps in existing datasets.
Findings
Baseline models achieve moderate performance on the dataset.
Analysis reveals challenges in sequence labeling of legal and technical terms.
Potential applications include regulation compliance and program analysis.
Abstract
Privacy protection raises great attention on both legal levels and user awareness. To protect user privacy, countries enact laws and regulations requiring software privacy policies to regulate their behavior. However, privacy policies are written in natural languages with many legal terms and software jargon that prevent users from understanding and even reading them. It is desirable to use NLP techniques to analyze privacy policies for helping users understand them. Furthermore, existing datasets ignore law requirements and are limited to English. In this paper, we construct the first Chinese privacy policy dataset, namely CA4P-483, to facilitate the sequence labeling tasks and regulation compliance identification between privacy policies and software. Our dataset includes 483 Chinese Android application privacy policies, over 11K sentences, and 52K fine-grained annotations. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy, Security, and Data Protection
