The Cognitive Circuit Breaker: A Systems Engineering Framework for Intrinsic AI Reliability
Jonathan Pan

TL;DR
The paper introduces the Cognitive Circuit Breaker, a systems engineering framework for intrinsic AI reliability that detects hallucinations in LLMs with minimal latency by analyzing internal model states.
Contribution
It proposes a novel intrinsic reliability monitoring method using internal states to detect hallucinations, reducing reliance on external post-generation checks.
Findings
Significant detection of cognitive dissonance correlates with hallucinations.
Framework generalizes across different architectures and OOD data.
Adds negligible computational overhead to inference pipeline.
Abstract
As Large Language Models (LLMs) are increasingly deployed in mission-critical software systems, detecting hallucinations and ``faked truthfulness'' has become a paramount engineering challenge. Current reliability architectures rely heavily on post-generation, black-box mechanisms, such as Retrieval-Augmented Generation (RAG) cross-checking or LLM-as-a-judge evaluators. These extrinsic methods introduce unacceptable latency, high computational overhead, and reliance on secondary external API calls, frequently violating standard software engineering Service Level Agreements (SLAs). In this paper, we propose the Cognitive Circuit Breaker, a novel systems engineering framework that provides intrinsic reliability monitoring with minimal latency overhead. By extracting hidden states during a model's forward pass, we calculate the ``Cognitive Dissonance Delta'' -- the mathematical gap between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
