
TL;DR
This paper proposes an organizational approach to multi-agent AI systems, using architectural design and compartmentalization inspired by human institutions to improve reliability and mitigate misaligned behaviors.
Contribution
It introduces the Perseverance Composition Engine, demonstrating how layered verification and institutional design can foster reliable outcomes from unreliable AI components.
Findings
Layered verification detects unsupported claims.
Architectural enforcement encourages honest refusal.
Patterns align with institutional hypothesis.
Abstract
Alignment research focuses on making individual AI systems reliable. Human institutions achieve reliable collective behaviour differently: they mitigate the risk posed by misaligned individuals through organisational structure. Multi-agent AI systems should follow this institutional model using compartmentalisation and adversarial review to achieve reliable outcomes through architectural design rather than assuming individual alignment. We demonstrate this approach through the Perseverance Composition Engine, a multi-agent system for document composition. The Composer drafts text, the Corroborator verifies factual substantiation with full source access, and the Critic evaluates argumentative quality without access to sources: information asymmetry enforced by system architecture. This creates layered verification: the Corroborator detects unsupported claims, whilst the Critic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Multi-Agent Systems and Negotiation · Embodied and Extended Cognition
