Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits
Amirhosein Ghasemabadi, Di Niu

TL;DR
This paper introduces Gnosis, a lightweight mechanism enabling large language models to internally predict their own failures by analyzing hidden states, improving self-assessment accuracy without external judges or additional compute.
Contribution
Gnosis is a novel, efficient self-awareness method that decodes internal signals of frozen LLMs to predict correctness, outperforming external judges across multiple benchmarks.
Findings
Gnosis improves failure prediction accuracy across various tasks.
It generalizes zero-shot to partial generations for early failure detection.
Operates with minimal additional parameters (~5M) and negligible inference cost.
Abstract
Large language models (LLMs) generate fluent and complex outputs but often fail to recognize their own mistakes and hallucinations. Existing approaches typically rely on external judges, multi-sample consistency, or text-based self-critique, which incur additional compute or correlate weakly with true correctness. We ask: can LLMs predict their own failures by inspecting internal states during inference? We introduce Gnosis, a lightweight self-awareness mechanism that enables frozen LLMs to perform intrinsic self-verification by decoding signals from hidden states and attention patterns. Gnosis passively observes internal traces, compresses them into fixed-budget descriptors, and predicts correctness with negligible inference cost, adding only ~5M parameters and operating independently of sequence length. Across math reasoning, open-domain question answering, and academic knowledge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Computational and Text Analysis Methods
