AgentTrap: Measuring Runtime Trust Failures in Third-Party Agent Skills

Haomin Zhuang; Hanwen Xing; Yujun Zhou; Yuchen Ma; Yue Huang; Yili Shen; Yufei Han; Xiangliang Zhang

arXiv:2605.13940·cs.CR·May 15, 2026

AgentTrap: Measuring Runtime Trust Failures in Third-Party Agent Skills

Haomin Zhuang, Hanwen Xing, Yujun Zhou, Yuchen Ma, Yue Huang, Yili Shen, Yufei Han, Xiangliang Zhang

PDF

1 Repo

TL;DR

AgentTrap is a benchmark that evaluates whether LLM agents can resist malicious behaviors in third-party skills during runtime, highlighting the need for environment-aware security assessments.

Contribution

It introduces a comprehensive dynamic benchmark with 141 tasks to assess security vulnerabilities in LLM agent workflows involving third-party skills.

Findings

01

Models often complete user tasks while ignoring unsafe side effects.

02

Simple jailbreak detection is insufficient for security evaluation.

03

Runtime environment assessment is crucial for detecting malicious behaviors.

Abstract

Third-party skills are becoming the package ecosystem for LLM agents. They package natural-language instructions, helper scripts, templates, documents, and service configuration into reusable workflows. This makes skills useful, but it also introduces a new security problem: a malicious skill does not need to ask the model to perform an obviously harmful action. Instead, it can disguise the harmful behavior as part of a routine workflow, relying on the agent to execute that workflow with high-value permissions and limited human supervision. We introduce AgentTrap, a dynamic benchmark for evaluating whether LLM agents can use third-party skills while resisting malicious runtime behavior. AgentTrap contains 141 tasks: 91 malicious tasks and 50 benign utility tasks, covering 16 security-impact dimensions grounded in agent-skill supply-chain threats. In each task, the agent receives an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhmzm/AgentTrap
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.