The Alignment Flywheel: A Governance-Centric Hybrid MAS for Architecture-Agnostic Safety

Elias Malomgr\'e; Pieter Simoens

arXiv:2603.02259·cs.MA·April 30, 2026

The Alignment Flywheel: A Governance-Centric Hybrid MAS for Architecture-Agnostic Safety

Elias Malomgr\'e, Pieter Simoens

PDF

TL;DR

The paper introduces the Alignment Flywheel, a governance-centric hybrid multi-agent system architecture that decouples decision-making from safety oversight, enabling safer deployment of autonomous systems through auditability and version control.

Contribution

It formalizes a flexible, implementation-agnostic architecture that separates decision generation from safety governance, facilitating runtime safety updates without retraining.

Findings

01

Architecture enables patch locality for safety failure mitigation.

02

Framework supports runtime gating, auditing, and staged rollout.

03

Decouples decision components from safety oversight for better auditability.

Abstract

Multi-agent systems provide mature methodologies for role decomposition, coordination, and normative governance, capabilities that remain essential as increasingly powerful autonomous decision components are embedded within agent-based systems. While learned and generative models substantially expand system capability, their safety behavior is often entangled with training, making it opaque, difficult to audit, and costly to update after deployment. This paper formalizes the Alignment Flywheel as a governance-centric hybrid MAS architecture that decouples decision generation from safety governance. A Proposer, representing any autonomous decision component, generates candidate trajectories, while a Safety Oracle returns raw safety signals through a stable interface. An enforcement layer applies explicit risk policy at runtime, and a governance MAS supervises the Oracle through auditing,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.