A Safety and Security Framework for Real-World Agentic Systems

Shaona Ghosh; Barnaby Simkin; Kyriacos Shiarlis; Soumili Nandi; Dan Zhao; Matthew Fiedler; Julia Bazinska; Nikki Pope; Roopa Prabhu; Daniel Rohrer; Michael Demoret; Bartley Richardson

arXiv:2511.21990·cs.LG·December 1, 2025

A Safety and Security Framework for Real-World Agentic Systems

Shaona Ghosh, Barnaby Simkin, Kyriacos Shiarlis, Soumili Nandi, Dan Zhao, Matthew Fiedler, Julia Bazinska, Nikki Pope, Roopa Prabhu, Daniel Rohrer, Michael Demoret, Bartley Richardson

PDF

Open Access 2 Datasets

TL;DR

This paper presents a comprehensive, dynamic framework for ensuring safety and security in agentic AI systems, emphasizing risk management through AI-driven red teaming and human oversight in complex enterprise environments.

Contribution

It introduces a novel agentic risk taxonomy and a dynamic safety framework that integrates auxiliary AI models and human oversight for risk discovery and mitigation.

Findings

01

Effective identification of novel agentic risks through sandboxed red teaming.

02

Demonstrated safety evaluation on NVIDIA's AI-Q Research Assistant.

03

Released a dataset with 10,000+ attack and defense traces for agentic workflows.

Abstract

This paper introduces a dynamic and actionable framework for securing agentic AI systems in enterprise deployment. We contend that safety and security are not merely fixed attributes of individual models but also emergent properties arising from the dynamic interactions among models, orchestrators, tools, and data within their operating environments. We propose a new way of identification of novel agentic risks through the lens of user safety. Although, for traditional LLMs and agentic models in isolation, safety and security has a clear separation, through the lens of safety in agentic systems, they appear to be connected. Building on this foundation, we define an operational agentic risk taxonomy that unifies traditional safety and security concerns with novel, uniquely agentic risks, including tool misuse, cascading action chains, and unintended control amplification among others. At…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation and Cyber Security · Access Control and Trust · Multi-Agent Systems and Negotiation