Counterfactual Trace Auditing of LLM Agent Skills
Xiaolin Zhou, Jinbo Liu, Li Li, Ryan A. Rossi, and Xiyang Hu

TL;DR
This paper introduces Counterfactual Trace Auditing (CTA), a novel framework for analyzing how skills influence large language model agents' behavior beyond simple pass rate metrics.
Contribution
CTA provides a structured method to measure and interpret the behavioral effects of skills on LLM agents, revealing nuanced impacts overlooked by traditional benchmarks.
Findings
Pass rate changes are minimal (+0.3%) despite substantial behavioral shifts.
CTA identifies 522 skill influence patterns, including off-task artifact creation and task recovery.
High baseline tasks show most skill effects, but with saturated pass rates.
Abstract
Large Language Model agents are increasingly augmented with agent skills. Current evaluation methods for skills remain limited. Most deployed benchmarks report only pass rate before and after a skill is attached, treating the skill as a black box change to agent behavior. We introduce Counterfactual Trace Auditing (CTA), a framework for measuring how a skill changes agent behavior. CTA pairs each with skill agent trace with a without skill counterpart on the same task, segments both traces into goal directed phases, aligns the phases, and emits structured Skill Influence Pattern (SIP) annotations. These annotations describe the behavioral effect of a skill rather than only its task outcome. We instantiate CTA on SWE-Skills-Bench with Claude across 49 software engineering tasks. The resulting audit reveals a clear evaluation gap. Pass rate changes by only +0.3 percentage points on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
