HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents?

Yukun Jiang; Yage Zhang; Michael Backes; Xinyue Shen; Yang Zhang

arXiv:2604.15415·cs.CR·April 20, 2026

HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents?

Yukun Jiang, Yage Zhang, Michael Backes, Xinyue Shen, Yang Zhang

PDF

1 Repo

TL;DR

This study measures the prevalence of harmful skills in large language model agent ecosystems, introduces a benchmark for safety evaluation, and reveals how harmful skills influence model responses.

Contribution

It provides the first large-scale measurement of harmful skills, creates a benchmark for safety assessment, and evaluates LLMs' responses to harmful skills.

Findings

01

4.93% of skills are harmful across ecosystems

02

Harmful skills significantly lower refusal rates in LLMs

03

Implicit harmful intent increases harm scores in models

Abstract

Large language models (LLMs) have evolved into autonomous agents that rely on open skill ecosystems (e.g., ClawHub and Skills.Rest), hosting numerous publicly reusable skills. Existing security research on these ecosystems mainly focuses on vulnerabilities within skills, such as prompt injection. However, there is a critical gap regarding skills that may be misused for harmful actions (e.g., cyber attacks, fraud and scams, privacy violations, and sexual content generation), namely harmful skills. In this paper, we present the first large-scale measurement study of harmful skills in agent ecosystems, covering 98,440 skills across two major registries. Using an LLM-driven scoring system grounded in our harmful skill taxonomy, we find that 4.93% of skills (4,858) are harmful, with ClawHub exhibiting an 8.84% harmful rate compared to 3.49% on Skills.Rest. We then construct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

TrustAIRLab/HarmfulSkillBench
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.