Mechanical Enforcement for LLM Governance:Evidence of Governance-Task Decoupling in Financial Decision Systems
Jos\'e Manuel de la Chica Rodr\'iguez, Carlos Mart\'i-Gonz\'alez

TL;DR
This paper demonstrates that mechanical enforcement significantly improves governance compliance and decision transparency in regulated financial AI systems, decoupling governance quality from task accuracy.
Contribution
It introduces five governance metrics and shows that architectural separation via primitives enhances compliance and transparency beyond traditional text-only approaches.
Findings
Mechanical enforcement reduces non-informative deferrals by 73%.
Task accuracy improves from MCC 0.43 to 0.88 with mechanical enforcement.
Governance quality remains stable under stress, unlike text-only governance.
Abstract
Large language models in regulated financial workflows are governed by natural-language policies that the same model interprets, creating a principal--agent failure: outputs can appear compliant without being compliant. Existing evaluation measures task accuracy but not whether governance constrains behaviour at the decision rationale level -- where regulated decisions must be auditable. We introduce five governance metrics that quantify policy compliance at the rationale level and apply them in a synthetic banking domain to compare text-only governance against mechanical enforcement: four primitives operating outside the model's interpretive loop. Under text-only governance, 27% of deferrals carry no decision-relevant information. Mechanical enforcement reduces this rate by 73%, more than doubles deferral information content, and raises task accuracy from MCC~ to . The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
