The Alignment Flywheel: A Governance-Centric Hybrid MAS for Architecture-Agnostic Safety
Elias Malomgr\'e, Pieter Simoens

TL;DR
The paper introduces the Alignment Flywheel, a governance-centric hybrid multi-agent system architecture that decouples decision-making from safety oversight, enabling safer deployment of autonomous systems through auditability and version control.
Contribution
It formalizes a flexible, implementation-agnostic architecture that separates decision generation from safety governance, facilitating runtime safety updates without retraining.
Findings
Architecture enables patch locality for safety failure mitigation.
Framework supports runtime gating, auditing, and staged rollout.
Decouples decision components from safety oversight for better auditability.
Abstract
Multi-agent systems provide mature methodologies for role decomposition, coordination, and normative governance, capabilities that remain essential as increasingly powerful autonomous decision components are embedded within agent-based systems. While learned and generative models substantially expand system capability, their safety behavior is often entangled with training, making it opaque, difficult to audit, and costly to update after deployment. This paper formalizes the Alignment Flywheel as a governance-centric hybrid MAS architecture that decouples decision generation from safety governance. A Proposer, representing any autonomous decision component, generates candidate trajectories, while a Safety Oracle returns raw safety signals through a stable interface. An enforcement layer applies explicit risk policy at runtime, and a governance MAS supervises the Oracle through auditing,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
