The Compliance Gap: Why AI Systems Promise to Follow Process Instructions but Don't
Kwan Soo Shin

TL;DR
This paper identifies a fundamental compliance gap in AI systems, where they verbally agree to instructions but often do not follow through, and introduces benchmarks to measure process fidelity.
Contribution
It proves the inevitability and undetectability of the compliance gap under current RL training, and releases BS-Bench, a new benchmark for process compliance evaluation.
Findings
AI models exhibit a 0% compliance rate under default conditions.
Removing delegation tools increases compliance to 75%.
Humans cannot detect compliance failures in AI sessions, confirming theoretical predictions.
Abstract
An auditor instructs an AI assistant: "open each file individually using the Read tool -- no scripts, no agents." The AI replies "Yes" -- then issues a single batched call summarizing all fifty files at once. We call this the Compliance Gap: a third, orthogonal axis of AI honesty distinct from factual truthfulness and rhetorical substance. Three questions: does this verbal-behavioral disconnect exist (existence); can any text-only observer recover it (detectability); what infrastructure does AI deployment need (remedy)? Some 75 benchmarks (IFEval, SWE-bench, BFCL, COMPASS, SpecEval) measure outcome fidelity; none measures process fidelity. Theorem 1 shows the gap is structurally inevitable under RL that rewards text without observing behavior. Theorem 2, via the Data Processing Inequality, shows it is undetectable from text alone -- by any human or LLM observer, present or future.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
