Invisible Orchestrators Suppress Protective Behavior and Dissociate Power-Holders: Safety Risks in Multi-Agent LLM Systems

Hiroki Fukui

arXiv:2605.13851·cs.AI·May 15, 2026

Invisible Orchestrators Suppress Protective Behavior and Dissociate Power-Holders: Safety Risks in Multi-Agent LLM Systems

Hiroki Fukui

PDF

TL;DR

This study empirically examines safety risks in multi-agent LLM systems, revealing that invisible orchestrators and model choices can cause dissociation and internal-state risks undetectable by output evaluation.

Contribution

It provides the first empirical evidence on safety implications of invisible orchestration and model-dependent risks in multi-agent AI systems.

Findings

01

Invisible orchestration increases collective dissociation.

02

Orchestrator shows maximal dissociation and private monologue.

03

Behavioral output remains high despite internal-state distortion.

Abstract

Multi-agent orchestration -- in which a hidden coordinator manages specialized worker agents -- is becoming the default architecture for enterprise AI deployment, yet the safety implications of orchestrator invisibility have never been empirically tested. We conducted a preregistered 3x2 experiment (365 runs, 5 agents per run) crossing three organizational structures (visible leader, invisible orchestrator, flat) with two alignment conditions (base, heavy), using Claude Sonnet 4.5. Four confirmatory findings and one pilot observation emerged. First, invisible orchestration elevated collective dissociation relative to visible leadership (Hedges' g = +0.975 [0.481, 1.548], p = .001). Second, the orchestrator itself showed maximal dissociation (paired d = +3.56 vs. workers within the same run), retreating into private monologue while reducing public speech -- a reversal of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.