Structured Security Auditing and Robustness Enhancement for Untrusted Agent Skills

Lijia Lv; Xuehai Tang; Jie Wen; Jizhong Han; Songlin Hu

arXiv:2604.25109·cs.CR·April 29, 2026

Structured Security Auditing and Robustness Enhancement for Untrusted Agent Skills

Lijia Lv, Xuehai Tang, Jie Wen, Jizhong Han, Songlin Hu

PDF

TL;DR

This paper introduces SkillGuard-Robust, a new method for security auditing of untrusted agent skills that significantly improves detection accuracy and consistency across diverse package ecosystems.

Contribution

It formulates pre-load auditing as a robust three-way classification task and presents SkillGuard-Robust, which enhances security review through evidence extraction and semantic verification.

Findings

01

Achieves over 97% exact match on held-out packages.

02

Reaches 100% malicious-risk recall on external ecosystems.

03

Materially improves robustness in public-ecosystem package auditing.

Abstract

Agent Skills package SKILL.md files, scripts, reference documents, and repository context into reusable capability units, turning pre-load auditing from single-prompt filtering into cross-file security review. Existing guardrails often flag risk but recover malicious intent inconsistently under semantics-preserving rewrites. This paper formulates pre-load auditing for untrusted Agent Skills as a robust three-way classification task and introduces SkillGuard-Robust, which combines role-aware evidence extraction, selective semantic verification, and consistency-preserving adjudication. We evaluate SkillGuard-Robust on SkillGuardBench and two public-ecosystem extensions through five large evaluation views ranging from 254 to 404 packages. On the 404-package held-out aggregate, SkillGuard-Robust reaches 97.30% overall exact match, 98.33% malicious-risk recall, and 98.89% attack exact…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.