Structured Security Auditing and Robustness Enhancement for Untrusted Agent Skills
Lijia Lv, Xuehai Tang, Jie Wen, Jizhong Han, Songlin Hu

TL;DR
This paper introduces SkillGuard-Robust, a new method for security auditing of untrusted agent skills that significantly improves detection accuracy and consistency across diverse package ecosystems.
Contribution
It formulates pre-load auditing as a robust three-way classification task and presents SkillGuard-Robust, which enhances security review through evidence extraction and semantic verification.
Findings
Achieves over 97% exact match on held-out packages.
Reaches 100% malicious-risk recall on external ecosystems.
Materially improves robustness in public-ecosystem package auditing.
Abstract
Agent Skills package SKILL.md files, scripts, reference documents, and repository context into reusable capability units, turning pre-load auditing from single-prompt filtering into cross-file security review. Existing guardrails often flag risk but recover malicious intent inconsistently under semantics-preserving rewrites. This paper formulates pre-load auditing for untrusted Agent Skills as a robust three-way classification task and introduces SkillGuard-Robust, which combines role-aware evidence extraction, selective semantic verification, and consistency-preserving adjudication. We evaluate SkillGuard-Robust on SkillGuardBench and two public-ecosystem extensions through five large evaluation views ranging from 254 to 404 packages. On the 404-package held-out aggregate, SkillGuard-Robust reaches 97.30% overall exact match, 98.33% malicious-risk recall, and 98.89% attack exact…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
