Black-Box Skill Stealing Attack from Proprietary LLM Agents: An Empirical Study
Zihan Wang, Rui Zhang, Yu Liu, Chi Liu, Qingchuan Zhao, Hongwei Li, Guowen Xu

TL;DR
This study systematically investigates black-box skill stealing attacks on proprietary LLM agents, revealing significant vulnerabilities and proposing defenses, but highlighting ongoing risks of copyright infringement.
Contribution
First comprehensive analysis of skill stealing attacks on LLM agents, including an automated attack pipeline and evaluation across platforms, emphasizing the need for better protections.
Findings
Skills can often be easily extracted from commercial LLM agents.
Existing defenses reduce leakage but do not eliminate the risk.
A single attack can compromise proprietary skills, posing copyright concerns.
Abstract
Large language model (LLM) agents increasingly rely on skills to package reusable capabilities through instructions, tools, and resources. High-quality skills embed expert knowledge, curated workflows, and execution constraints into agents, fueling a growing skill economy through their value and scalability. Yet this ecosystem also creates a new attack surface, as adversaries can interact with public agent interfaces to extract hidden proprietary skill content. We present the first systematic study of black-box skill stealing against LLM agent systems. Compared with conventional system prompt stealing, skill stealing targets modular and structured capability packages whose leakage is directly actionable for copying, redistribution, and monetization, making the resulting harm potentially greater. To study this threat, we derive an attack taxonomy from prior prompt-stealing methods and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
