Unveiling Privacy Policy Complexity: An Exploratory Study Using Graph Mining, Machine Learning, and Natural Language Processing

Vijayalakshmi Ramasamy; Seth Barrett; Gokila Dorai; Jessica Zumbach

arXiv:2507.02968·cs.CR·July 8, 2025

Unveiling Privacy Policy Complexity: An Exploratory Study Using Graph Mining, Machine Learning, and Natural Language Processing

Vijayalakshmi Ramasamy, Seth Barrett, Gokila Dorai, Jessica Zumbach

PDF

TL;DR

This paper explores using graph visualization, machine learning, and NLP to analyze and improve understanding of complex privacy policies, revealing key themes and patterns to enhance transparency and compliance.

Contribution

It introduces a novel approach combining graph models, mining algorithms, and dimensionality reduction to interpret privacy policies and identify risks.

Findings

01

Graph-based clustering improves policy interpretability

02

Identifies key themes like User Activity and Device Information

03

Supports forensic investigations and compliance detection

Abstract

Privacy policy documents are often lengthy, complex, and difficult for non-expert users to interpret, leading to a lack of transparency regarding the collection, processing, and sharing of personal data. As concerns over online privacy grow, it is essential to develop automated tools capable of analyzing privacy policies and identifying potential risks. In this study, we explore the potential of interactive graph visualizations to enhance user understanding of privacy policies by representing policy terms as structured graph models. This approach makes complex relationships more accessible and enables users to make informed decisions about their personal data (RQ1). We also employ graph mining algorithms to identify key themes, such as User Activity and Device Information, using dimensionality reduction techniques like t-SNE and PCA to assess clustering effectiveness. Our findings…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.