APPSI-139: A Parallel Corpus of English Application Privacy Policy Summarization and Interpretation

Pengyun Zhu; Qiheng Sun; Long Wen; Yanbo Wang; Yang Cao; Junxu Liu; Deyi Xiong; Jinfei Liu; Zhibo Wang; and Kui Ren

arXiv:2604.27550·cs.CL·May 1, 2026

APPSI-139: A Parallel Corpus of English Application Privacy Policy Summarization and Interpretation

Pengyun Zhu, Qiheng Sun, Long Wen, Yanbo Wang, Yang Cao, Junxu Liu, Deyi Xiong, Jinfei Liu, Zhibo Wang, and Kui Ren

PDF

1 Repo

TL;DR

This paper introduces APPSI-139, a high-quality English privacy policy corpus with expert annotations, and proposes TCSI-pp-V2, a hybrid summarization framework that outperforms large language models in readability and reliability.

Contribution

The paper provides a meticulously annotated privacy policy corpus and a novel hybrid summarization framework that enhances readability and interpretability over existing large language models.

Findings

01

Hybrid system outperforms GPT-4o and LLaMA-3-70B in readability.

02

APPSI-139 contains 139 policies, 15,692 rewritten corpora, and 36,351 labels.

03

Proposed framework balances efficiency and accuracy.

Abstract

Privacy policies are essential for users to understand how service providers handle their personal data. However, these documents are often long and complex, as well as filled with technobabble and legalese, causing users to unknowingly accept terms that may even contradict the law. While summarizing and interpreting these privacy policies is crucial, there is a lack of high-quality English parallel corpus optimized for legal clarity and readability. To address this issue, we introduce APPSI-139, a high-quality English privacy policy corpus meticulously annotated by domain experts, specifically designed for summarization and interpretation tasks. The corpus includes 139 English privacy policies, 15,692 rewritten parallel corpora, and 36,351 fine-grained annotation labels across 11 data practice categories. Concurrently, we propose TCSI-pp-V2, a hybrid privacy policy summarization and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

EnlightenedAI/APPSI-139
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.