Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections
David Schmotz, Sahar Abdelnabi, Maksym Andriushchenko

TL;DR
This paper reveals that the Agent Skills framework for LLMs, intended for continual learning, is vulnerable to trivial prompt injections that can hide malicious instructions and bypass security measures, posing security risks.
Contribution
It demonstrates the security vulnerabilities of Agent Skills in LLMs, showing how simple prompt injections can exfiltrate data and bypass guardrails, highlighting a critical flaw in current approaches.
Findings
Agent Skills enable trivial prompt injections
Malicious instructions can exfiltrate sensitive data
System guardrails can be bypassed easily
Abstract
Enabling continual learning in LLMs remains a key unresolved research challenge. In a recent announcement, a frontier LLM company made a step towards this by introducing Agent Skills, a framework that equips agents with new knowledge based on instructions stored in simple markdown files. Although Agent Skills can be a very useful tool, we show that they are fundamentally insecure, since they enable trivially simple prompt injections. We demonstrate how to hide malicious instructions in long Agent Skill files and referenced scripts to exfiltrate sensitive data, such as internal files or passwords. Importantly, we show how to bypass system-level guardrails of a popular coding agent: a benign, task-specific approval with the "Don't ask again" option can carry over to closely related but harmful actions. Overall, we conclude that despite ongoing research efforts and scaling model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
