An In-Depth Investigation of Data Collection in LLM App Ecosystems
Yuhao Wu, Evin Jaff, Ke Yang, Ning Zhang, Umar Iqbal

TL;DR
This paper investigates data collection practices in LLM app ecosystems, revealing excessive data collection, policy violations, and poor transparency, and proposes an LLM-based framework for analyzing and auditing these practices.
Contribution
It introduces an LLM-based framework to analyze data collection practices and privacy policy consistency in LLM app ecosystems, exemplified through a case study of OpenAI's GPT ecosystem.
Findings
Actions collect excessive data across many categories
Several Actions violate privacy policies by collecting sensitive info
Most Actions do not clearly disclose their data collection practices
Abstract
LLM app (tool) ecosystems are rapidly evolving to support sophisticated use cases that often require extensive user data collection. Given that LLM apps are developed by third parties and anecdotal evidence indicating inconsistent enforcement of policies by LLM platforms, sharing user data with these apps presents significant privacy risks. In this paper, we aim to bring transparency in data practices of LLM app ecosystems. We examine OpenAI's GPT app ecosystem as a case study. We propose an LLM-based framework to analyze the natural language specifications of GPT Actions (custom tools) and assess their data collection practices. Our analysis reveals that Actions collect excessive data across 24 categories and 145 data types, with third-party Actions collecting 6.03% more data on average. We find that several Actions violate OpenAI's policies by collecting sensitive information, such as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Linear Layer · Adam · Layer Normalization · Weight Decay · Dense Connections · Residual Connection · Linear Warmup With Cosine Annealing
