TL;DR
STARS introduces a method for continuous risk estimation of skill invocations in autonomous agents, combining static and dynamic analysis to improve safety and triage decisions.
Contribution
The paper presents STARS, a novel approach integrating static priors and request-conditioned risk models for real-time invocation safety assessment.
Findings
Calibrated fusion achieves 0.439 high-risk AUPRC on attack detection.
Contextual scorer outperforms static baseline in risk calibration.
Request-conditioned auditing is most effective as an invocation-time risk layer.
Abstract
Autonomous language-model agents increasingly rely on installable skills and tools to complete user tasks. Static skill auditing can expose capability surface before deployment, but it cannot determine whether a particular invocation is unsafe under the current user request and runtime context. We therefore study skill invocation auditing as a continuous-risk estimation problem: given a user request, candidate skill, and runtime context, predict a score that supports ranking and triage before a hard intervention is applied. We introduce STARS, which combines a static capability prior, a request-conditioned invocation risk model, and a calibrated risk-fusion policy. To evaluate this setting, we construct SIA-Bench, a benchmark of 3,000 invocation records with group-safe splits, lineage metadata, runtime context, canonical action labels, and derived continuous-risk targets. On a held-out…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
