FormalJudge: A Neuro-Symbolic Paradigm for Agentic Oversight
Jiayi Zhou, Yang Sheng, Hantao Lou, Yaodong Yang, Jie Fu

TL;DR
FormalJudge introduces a neuro-symbolic framework combining LLMs and formal verification to improve agent oversight, providing reliable, mathematically guaranteed safety checks for high-stakes AI systems.
Contribution
It bridges natural language specifications and formal verification using a bidirectional architecture, enabling scalable, accurate oversight of AI agents with mathematical guarantees.
Findings
16.6% average improvement over baseline oversight methods
Over 90% accuracy in deception detection with smaller models
Near-linear safety enhancement through iterative refinement
Abstract
As LLM-based agents increasingly operate in high-stakes domains with real-world consequences, ensuring their behavioral safety becomes paramount. The dominant oversight paradigm, LLM-as-a-Judge, faces a fundamental dilemma: how can probabilistic systems reliably supervise other probabilistic systems without inheriting their failure modes? We argue that formal verification offers a principled escape from this dilemma, yet its adoption has been hindered by a critical bottleneck: the translation from natural language requirements to formal specifications. This paper bridges this gap by proposing , a neuro-symbolic framework that employs a bidirectional Formal-of-Thought architecture: LLMs serve as specification compilers that top-down decompose high-level human intent into atomic, verifiable constraints, then bottom-up prove compliance using Dafny specifications and Z3 Satisfiability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning
