Beyond Single-Agent Safety: A Taxonomy of Risks in LLM-to-LLM Interactions
Piercosma Bisconti, Marcello Galisai, Federico Pierucci, Marcantonio Bracale, Matteo Prandi

TL;DR
This paper explores the unique safety challenges in LLM-to-LLM interactions, proposing a system-level safety framework and taxonomy to address emergent risks beyond traditional single-model safety measures.
Contribution
It introduces a novel systemic safety framework, the ESRH, and a taxonomy of failure modes, advancing safety understanding for multi-agent LLM ecosystems.
Findings
Identifies how local compliance can lead to collective failure in LLM interactions.
Proposes the ESRH framework to formalize systemic risk emergence.
Suggests the InstitutionalAI architecture for adaptive oversight in multi-agent systems.
Abstract
This paper examines why safety mechanisms designed for human-model interaction do not scale to environments where large language models (LLMs) interact with each other. Most current governance practices still rely on single-agent safety containment, prompts, fine-tuning, and moderation layers that constrain individual model behavior but leave the dynamics of multi-model interaction ungoverned. These mechanisms assume a dyadic setting: one model responding to one user under stable oversight. Yet research and industrial development are rapidly shifting toward LLM-to-LLM ecosystems, where outputs are recursively reused as inputs across chains of agents. In such systems, local compliance can aggregate into collective failure even when every model is individually aligned. We propose a conceptual transition from model-level safety to system-level safety, introducing the framework of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation · Human-Automation Interaction and Safety · Text Readability and Simplification
