TL;DR
PolicyGapper employs large language models to automatically identify discrepancies between Google Play Data Safety Sections and privacy policies, enhancing transparency and compliance verification.
Contribution
It introduces an automated, LLM-based approach for detecting inconsistencies between app data disclosures and privacy policies without needing app binaries.
Findings
Identified 2,689 omitted disclosures across 330 apps
Achieved an average F1-score of 0.76 in discrepancy detection
Released a complete reproducibility package including code and dataset
Abstract
Mobile application developers are required to disclose how they collect, use, and share user data in compliance with privacy regulations. To support transparency, major app marketplaces have introduced standardized disclosure mechanisms. In 2022, Google mandated the Data Safety Section (DSS) on Google Play, requiring developers to summarize their data practices. However, compiling accurate DSS disclosures is challenging, as they must remain consistent with the corresponding privacy policy (PP), and no automated tool currently verifies this alignment. Prior studies indicate that nearly 80% of popular apps contain incomplete or misleading DSS declarations. We present PolicyGapper, an LLM-based methodology for automatically detecting discrepancies between DSS disclosures and privacy policies. PolicyGapper operates in four stages: scraping, pre-processing, analysis, and post-processing,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
