Automated Generation of Accurate Privacy Captions From Android Source Code Using Large Language Models
Vijayanta Jain, Sepideh Ghanavati, Sai Teja Peddinti, Collin McMillan

TL;DR
This paper introduces PCapGen, a novel system that leverages large language models and precise source code analysis to automatically generate accurate, concise, and comprehensive privacy captions for Android apps, improving transparency and compliance.
Contribution
The work presents a new approach combining source code extraction and LLMs to generate privacy captions, addressing limitations of previous methods relying on questionnaires, templates, or policies.
Findings
PCapGen produces privacy captions that are more accurate and complete than baseline methods.
Privacy experts prefer PCapGen-generated captions over alternatives in at least 71% of cases.
LLMs as judges favor PCapGen captions at least 76% of the time.
Abstract
Privacy captions are short sentences that succinctly describe what personal information is used, how it is used, and why, within an app. These captions can be utilized in various notice formats, such as privacy policies, app rationales, and app store descriptions. However, inaccurate captions may mislead users and expose developers to regulatory fines. Existing approaches to generating privacy notices or just privacy captions include using questionnaires, templates, static analysis, or machine learning. However, these approaches either rely heavily on developers' inputs and thus strain their efforts, use limited source code context, leading to the incomplete capture of app privacy behaviors, or depend on potentially inaccurate privacy policies as a source for creating notices. In this work, we address these limitations by developing Privacy Caption Generator (PCapGen), an approach that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Privacy, Security, and Data Protection · Software Engineering Research
