ASTRA: Agentic Steerability and Risk Assessment Framework

Itay Hazan; Yael Mathov; Guy Shtar; Ron Bitton; Itsik Mantin

arXiv:2511.18114·cs.CR·November 25, 2025

ASTRA: Agentic Steerability and Risk Assessment Framework

Itay Hazan, Yael Mathov, Guy Shtar, Ron Bitton, Itsik Mantin

PDF

Open Access

TL;DR

This paper introduces ASTRA, a comprehensive framework for evaluating the security and steerability of AI agents powered by LLMs, focusing on their ability to enforce custom guardrails against agentic threats.

Contribution

ASTRA is the first holistic framework that simulates diverse autonomous agents and tests their ability to enforce security guardrails against novel, agentic attack scenarios.

Findings

01

Significant differences in security performance among 13 open-source LLMs.

02

Many LLMs struggle to consistently enforce custom guardrails.

03

The framework enables systematic evaluation of AI agent security.

Abstract

Securing AI agents powered by Large Language Models (LLMs) represents one of the most critical challenges in AI security today. Unlike traditional software, AI agents leverage LLMs as their "brain" to autonomously perform actions via connected tools. This capability introduces significant risks that go far beyond those of harmful text presented in a chatbot that was the main application of LLMs. A compromised AI agent can deliberately abuse powerful tools to perform malicious actions, in many cases irreversible, and limited solely by the guardrails on the tools themselves and the LLM ability to enforce them. This paper presents ASTRA, a first-of-its-kind framework designed to evaluate the effectiveness of LLMs in supporting the creation of secure agents that enforce custom guardrails defined at the system-prompt level (e.g., "Do not send an email out of the company domain," or "Never…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Ethics and Social Impacts of AI · Artificial Intelligence in Healthcare and Education