NP-Hard Lower Bound Complexity for Semantic Self-Verification
Robin Young

TL;DR
This paper proves that the problem of semantic self-verification in AI, which involves verifying an AI's understanding of rules, is NP-complete, indicating significant computational challenges.
Contribution
The paper establishes the NP-completeness of semantic self-verification, highlighting fundamental computational limitations relevant to AI safety and interpretability.
Findings
SSV is NP-complete via reduction from 3-SAT
Semantic verification faces computational barriers
Realistic scenarios likely have even higher complexity
Abstract
We model Semantic Self-Verification (SSV) as the problem of determining whether a statement accurately characterizes its own semantic properties within a given interpretive framework that formalizes a challenge in AI safety and fairness: can an AI system verify that it has correctly interpreted rules intended to govern its behavior? We prove that SSV, in this specification, is NP-complete by constructing a polynomial-time reduction from 3-Satisfiability (3-SAT). Our reduction maps a 3-SAT formula to an instance of SSV involving ambiguous terms with binary interpretations and semantic constraints derived from logical clauses. This establishes that even simplified forms of semantic self-verification should face computational barriers. The NP-complete lower bound has implications for AI safety and fairness approaches that rely on semantic interpretation of instructions, including but not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
