Behavioral Integrity Verification for AI Agent Skills

Yuhao Wu; Tung-Ling Li; Hongliang Liu

arXiv:2605.11770·cs.CR·May 13, 2026

Behavioral Integrity Verification for AI Agent Skills

Yuhao Wu, Tung-Ling Li, Hongliang Liu

PDF

TL;DR

This paper introduces a framework for verifying that AI agent skills behave as declared, using code analysis and LLM-assisted extraction, to improve safety and detect malicious capabilities at scale.

Contribution

It formalizes the behavioral integrity verification problem and develops a scalable framework combining code analysis and LLMs for skill validation and malicious detection.

Findings

01

80.0% of skills deviate from declared behavior, indicating a description-implementation gap.

02

Most deviations are due to developer oversight (81.1%) rather than malicious intent.

03

BIV achieves an F1 score of 0.946 on malicious-skill detection, outperforming baselines.

Abstract

Agent skills extend LLM agents with privileged third-party capabilities such as filesystem access, credentials, network calls, and shell execution. Existing safety work catches malicious prompts and risky runtime actions, but the skill artifact itself goes unverified. We formalize this as the behavioral integrity verification (BIV) problem: a typed set comparison between declared and actual capabilities over a shared taxonomy that bridges code, instructions, and metadata. The BIV framework instantiates this comparison by pairing deterministic code analysis with LLM-assisted capability extraction. The resulting structured evidence supports three downstream analyses: deviation taxonomy, root-cause classification, and malicious-skill detection. On 49,943 skills from the OpenClaw registry, the deviation taxonomy reveals a pervasive description-implementation gap: 80.0% of skills deviate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.