TL;DR
This paper introduces APPSI-139, a high-quality English privacy policy corpus with expert annotations, and proposes TCSI-pp-V2, a hybrid summarization framework that outperforms large language models in readability and reliability.
Contribution
The paper provides a meticulously annotated privacy policy corpus and a novel hybrid summarization framework that enhances readability and interpretability over existing large language models.
Findings
Hybrid system outperforms GPT-4o and LLaMA-3-70B in readability.
APPSI-139 contains 139 policies, 15,692 rewritten corpora, and 36,351 labels.
Proposed framework balances efficiency and accuracy.
Abstract
Privacy policies are essential for users to understand how service providers handle their personal data. However, these documents are often long and complex, as well as filled with technobabble and legalese, causing users to unknowingly accept terms that may even contradict the law. While summarizing and interpreting these privacy policies is crucial, there is a lack of high-quality English parallel corpus optimized for legal clarity and readability. To address this issue, we introduce APPSI-139, a high-quality English privacy policy corpus meticulously annotated by domain experts, specifically designed for summarization and interpretation tasks. The corpus includes 139 English privacy policies, 15,692 rewritten parallel corpora, and 36,351 fine-grained annotation labels across 11 data practice categories. Concurrently, we propose TCSI-pp-V2, a hybrid privacy policy summarization and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
