Tool Receipts, Not Zero-Knowledge Proofs: Practical Hallucination Detection for AI Agents
Abhinaba Basu

TL;DR
NabaOS offers a practical, fast, and effective verification framework for AI agent hallucination detection using tool receipts and epistemic classification, outperforming cryptographic proofs in real-time scenarios.
Contribution
We introduce NabaOS, a lightweight verification system that classifies claims by epistemic source and uses signed tool receipts for real-time hallucination detection in AI agents.
Findings
Detects over 90% of hallucinations with <15ms overhead
Outperforms cryptographic proofs in cost and latency
Effective across multiple languages and hallucination types
Abstract
AI agents that execute tasks via tool calls frequently hallucinate results - fabricating tool executions, misstating output counts, or presenting inferences as facts. Recent approaches to verifiable AI inference rely on zero-knowledge proofs, which provide cryptographic guarantees but impose minutes of proving time per query, making them impractical for interactive agents. We propose NabaOS, a lightweight verification framework inspired by Indian epistemology (Nyaya Shastra), which classifies every claim in an LLM response by its epistemic source (pramana): direct tool output (pratyaksha), inference (anumana), external testimony (shabda), absence (abhava), or ungrounded opinion. Our runtime generates HMAC-signed tool execution receipts that the LLM cannot forge, then cross-references claims against these receipts to detect hallucinations in real time. We evaluate on NyayaVerifyBench, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Security and Verification in Computing · Advanced Malware Detection Techniques
