SkillTester: Benchmarking Utility and Security of Agent Skills

Leye Wang; Zixing Wang; Anjie Xu

arXiv:2603.28815·cs.CR·April 1, 2026

SkillTester: Benchmarking Utility and Security of Agent Skills

Leye Wang, Zixing Wang, Anjie Xu

PDF

1 Repo

TL;DR

SkillTester is a comprehensive framework and tool for evaluating the utility and security of agent skills, providing normalized scores and security labels to ensure quality in agent-first systems.

Contribution

It introduces a novel evaluation framework combining utility and security assessments with a public tool and repository for agent skill benchmarking.

Findings

01

Provides normalized utility and security scores for agent skills.

02

Includes a security probe suite for detecting vulnerabilities.

03

Offers a public deployment and open-source project for ongoing benchmarking.

Abstract

This technical report presents SkillTester, a tool for evaluating the utility and security of agent skills. Its evaluation framework combines paired baseline and with-skill execution conditions with a separate security probe suite. Grounded in a comparative utility principle and a user-facing simplicity principle, the framework normalizes raw execution artifacts into a utility score, a security score, and a three-level security status label. More broadly, it can be understood as a comparative quality-assurance harness for agent skills in an agent-first world. The public service is deployed at https://skilltester.ai, and the broader project is maintained at https://github.com/skilltester-ai/skilltester.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

skilltester-ai/skilltester
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.