ASTRA: Agentic Steerability and Risk Assessment Framework
Itay Hazan, Yael Mathov, Guy Shtar, Ron Bitton, Itsik Mantin

TL;DR
This paper introduces ASTRA, a comprehensive framework for evaluating the security and steerability of AI agents powered by LLMs, focusing on their ability to enforce custom guardrails against agentic threats.
Contribution
ASTRA is the first holistic framework that simulates diverse autonomous agents and tests their ability to enforce security guardrails against novel, agentic attack scenarios.
Findings
Significant differences in security performance among 13 open-source LLMs.
Many LLMs struggle to consistently enforce custom guardrails.
The framework enables systematic evaluation of AI agent security.
Abstract
Securing AI agents powered by Large Language Models (LLMs) represents one of the most critical challenges in AI security today. Unlike traditional software, AI agents leverage LLMs as their "brain" to autonomously perform actions via connected tools. This capability introduces significant risks that go far beyond those of harmful text presented in a chatbot that was the main application of LLMs. A compromised AI agent can deliberately abuse powerful tools to perform malicious actions, in many cases irreversible, and limited solely by the guardrails on the tools themselves and the LLM ability to enforce them. This paper presents ASTRA, a first-of-its-kind framework designed to evaluate the effectiveness of LLMs in supporting the creation of secure agents that enforce custom guardrails defined at the system-prompt level (e.g., "Do not send an email out of the company domain," or "Never…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Ethics and Social Impacts of AI · Artificial Intelligence in Healthcare and Education
