LADFA: A Framework of Using Large Language Models and Retrieval-Augmented Generation for Personal Data Flow Analysis in Privacy Policies
Haiyue Yuan, Nikolay Matyunin, Ali Raza, Shujun Li

TL;DR
LADFA is a comprehensive framework that combines large language models with retrieval-augmented generation to automatically analyze and visualize personal data flows in privacy policies, aiding understanding and compliance.
Contribution
This work introduces LADFA, an innovative, flexible framework that integrates LLMs and RAG with a custom knowledge base for detailed privacy policy analysis.
Findings
Effective extraction of personal data flows demonstrated in automotive privacy policies
High accuracy in constructing data flow graphs from unstructured privacy policy texts
Framework's flexibility allows adaptation to various text analysis tasks
Abstract
Privacy policies help inform people about organisations' personal data processing practices, covering different aspects such as data collection, data storage, and sharing of personal data with third parties. Privacy policies are often difficult for people to fully comprehend due to the lengthy and complex legal language used and inconsistent practices across different sectors and organisations. To help conduct automated and large-scale analyses of privacy policies, many researchers have studied applications of machine learning and natural language processing techniques, including large language models (LLMs). While a limited number of prior studies utilised LLMs for extracting personal data flows from privacy policies, our approach builds on this line of work by combining LLMs with retrieval-augmented generation (RAG) and a customised knowledge base derived from existing studies. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy, Security, and Data Protection · Privacy-Preserving Technologies in Data · Personal Information Management and User Behavior
