PrivSTRUCT: Untangling Data Purpose Compliance of Privacy Policies in Google Play Store
Bhanuka Silva, Anirban Mahanti, Aruna Seneviratne, Suranga Senevirante

TL;DR
PrivSTRUCT is a new framework that improves extraction of data practices from privacy policies by considering document structure, revealing significant transparency gaps and misstatements in app disclosures.
Contribution
It introduces PrivSTRUCT, a systematic encoder-decoder that outperforms existing tools by leveraging structural cues to better analyze privacy policies.
Findings
PrivSTRUCT extracts over twice as many data and purpose excerpts as PoliGrapher.
Developers overstate data purposes more when using global purposes rather than local disclosures.
Sensitive third-party data flows are often diluted and entangled into unrelated categories.
Abstract
Existing research typically treats privacy policies as flat, uniform text, extracting information without regard for the document's logical hierarchy. Disregard for structural cues of section headings designed to guide the reader, often leads automated methods to entangle distinct data practices, particularly when linking sensitive data items to their specific purposes. To address this, we introduce PrivSTRUCT, a novel and systematic encoder and decoder combined framework that to untangle complex privacy disclosures. Benchmarking against the state-of-the-art tool PoliGrapher reveals that PrivSTRUCT robustly extracts more than x2 the number of data item and purpose excerpts while retaining developer-defined structural cues. By applying PrivSTRUCT to a large-scale dataset of 3,756 Android apps, we uncover a critical transparency gap: the probability of developers overstating a data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
