SkillSieve: A Hierarchical Triage Framework for Detecting Malicious AI Agent Skills

Yinghan Hou; Zongyou Yang

arXiv:2604.06550·cs.CR·April 9, 2026

SkillSieve: A Hierarchical Triage Framework for Detecting Malicious AI Agent Skills

Yinghan Hou, Zongyou Yang

PDF

1 Repo

TL;DR

SkillSieve is a hierarchical detection framework that efficiently identifies malicious AI agent skills by combining regex, static analysis, and multi-layered LLM evaluations, significantly improving accuracy over previous methods.

Contribution

It introduces a novel three-layer detection system that progressively applies analysis, leveraging LLMs with parallel sub-tasks and voting, to detect security vulnerabilities in AI agent skills.

Findings

01

Filters 86% of benign skills in under 40ms at zero API cost.

02

Achieves 0.800 F1 score on a benchmark, outperforming prior work.

03

Operates effectively on real-world skills and adversarial samples.

Abstract

OpenClaw's ClawHub marketplace hosts over 13,000 community-contributed agent skills, and between 13% and 26% of them contain security vulnerabilities according to recent audits. Regex scanners miss obfuscated payloads; formal static analyzers cannot read the natural language instructions in SKILL.md files where prompt injection and social engineering attacks hide. Neither approach handles both modalities. SkillSieve is a three-layer detection framework that applies progressively deeper analysis only where needed. Layer 1 runs regex, AST, and metadata checks through an XGBoost-based feature scorer, filtering roughly 86% of benign skills in under 40ms on average at zero API cost. Layer 2 sends suspicious skills to an LLM, but instead of asking one broad question, it splits the analysis into four parallel sub-tasks (intent alignment, permission justification, covert behavior detection,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.