Alignment as Institutional Design: From Behavioral Correction to Transaction Structure in Intelligent Systems
Rui Chai

TL;DR
This paper proposes viewing AI alignment as an institutional design problem, emphasizing transaction structures and economic principles to create robust, self-correcting systems rather than relying solely on behavioral correction methods.
Contribution
It introduces a novel framework that transforms AI alignment into a political-economy problem using institutional design principles inspired by economics.
Findings
Alignment can be achieved through transaction structures that make misalignment costly and detectable.
Three levels of human intervention are identified: structural, parametric, and monitorial.
The framework promotes institutional robustness over perfection in alignment strategies.
Abstract
Current AI alignment paradigms rely on behavioral correction: external supervisors (e.g., RLHF) observe outputs, judge against preferences, and adjust parameters. This paper argues that behavioral correction is structurally analogous to an economy without property rights, where order requires perpetual policing and does not scale. Drawing on institutional economics (Coase, Alchian, Cheung), capability mutual exclusivity, and competitive cost discovery, we propose alignment as institutional design: the designer specifies internal transaction structures (module boundaries, competition topologies, cost-feedback loops) such that aligned behavior emerges as the lowest-cost strategy for each component. We identify three irreducible levels of human intervention (structural, parametric, monitorial) and show that this framework transforms alignment from a behavioral control problem into a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
