SDOF: Taming the Alignment Tax in Multi-Agent Orchestration with State-Constrained Dispatch
Zhantao Wang

TL;DR
SDOF is a framework that enhances multi-agent orchestration by enforcing stage constraints through a state machine, improving accuracy, auditable control, and task completion in complex business workflows.
Contribution
It introduces a novel state-constrained dispatch framework with specialized intent routing and validation components, addressing the alignment tax in multi-agent orchestration.
Findings
Achieves 86.5% task completion rate in real-world recruitment system
Higher joint accuracy (80.9%) than zero-shot GPT-4o on FSM-constrained routing
Attains 100% precision and 88% recall in message-level blocking audit
Abstract
Multi-agent orchestration frameworks such as LangChain, LangGraph, and CrewAI route tasks through graph-based pipelines but do not enforce the stage constraints that govern real business processes. We present SDOF, a framework that treats multi-agent execution as a constrained state machine. SDOF operates through two primary defensive layers, implemented by three components: (1) an Online-RLHF Specialized Intent Router trained via Generative Reward Modeling (GRPO) and (2) a StateAwareDispatcher with GoalStage finite-automaton checks and precondition/postcondition SkillRegistry validation for auditable execution control. On a recruitment system backed by the Beisen iTalent platform (6000+ enterprises), 185 expert-curated scenarios trigger 1671 live API calls. Our GSPO-aligned 7B Intent Router achieves higher joint accuracy than zero-shot GPT-4o on this FSM-constrained adversarial routing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
