Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits

Amirhosein Ghasemabadi; Di Niu

arXiv:2512.20578·cs.CL·January 6, 2026

Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits

Amirhosein Ghasemabadi, Di Niu

PDF

Open Access 4 Models 3 Datasets

TL;DR

This paper introduces Gnosis, a lightweight mechanism enabling large language models to internally predict their own failures by analyzing hidden states, improving self-assessment accuracy without external judges or additional compute.

Contribution

Gnosis is a novel, efficient self-awareness method that decodes internal signals of frozen LLMs to predict correctness, outperforming external judges across multiple benchmarks.

Findings

01

Gnosis improves failure prediction accuracy across various tasks.

02

It generalizes zero-shot to partial generations for early failure detection.

03

Operates with minimal additional parameters (~5M) and negligible inference cost.

Abstract

Large language models (LLMs) generate fluent and complex outputs but often fail to recognize their own mistakes and hallucinations. Existing approaches typically rely on external judges, multi-sample consistency, or text-based self-critique, which incur additional compute or correlate weakly with true correctness. We ask: can LLMs predict their own failures by inspecting internal states during inference? We introduce Gnosis, a lightweight self-awareness mechanism that enables frozen LLMs to perform intrinsic self-verification by decoding signals from hidden states and attention patterns. Gnosis passively observes internal traces, compresses them into fixed-budget descriptors, and predicts correctness with negligible inference cost, adding only ~5M parameters and operating independently of sequence length. Across math reasoning, open-domain question answering, and academic knowledge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Computational and Text Analysis Methods