On the (In)Security of LLM App Stores
Xinyi Hou, Yanjie Zhao, and Haoyu Wang

TL;DR
This paper investigates the security risks of large language model app stores by analyzing over 786,000 apps, revealing significant threats like malicious content, privacy violations, and potential for misuse, and proposes a framework for risk identification.
Contribution
It introduces a three-layer concern framework for assessing LLM app security risks and provides large-scale empirical analysis using static/dynamic methods and toxic word detection.
Findings
15,146 apps had misleading descriptions
1,366 apps collected sensitive data illegally
15,996 apps generated harmful content
Abstract
LLM app stores have seen rapid growth, leading to the proliferation of numerous custom LLM apps. However, this expansion raises security concerns. In this study, we propose a three-layer concern framework to identify the potential security risks of LLM apps, i.e., LLM apps with abusive potential, LLM apps with malicious intent, and LLM apps with exploitable vulnerabilities. Over five months, we collected 786,036 LLM apps from six major app stores: GPT Store, FlowGPT, Poe, Coze, Cici, and Character.AI. Our research integrates static and dynamic analysis, the development of a large-scale toxic word dictionary (i.e., ToxicDict) comprising over 31,783 entries, and automated monitoring tools to identify and mitigate threats. We uncovered that 15,146 apps had misleading descriptions, 1,366 collected sensitive personal information against their privacy policies, and 15,996 generated harmful…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Rights Management and Security
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Byte Pair Encoding · Cosine Annealing · Layer Normalization · Linear Layer · Attention Dropout · Adam · Dropout · Weight Decay · Multi-Head Attention
