Geometric Deviation as an Unsupervised Pre-Generation Reliability Signal: Probing LLM Representations for Answerability
Yucheng Du

TL;DR
This paper explores how the geometry of hidden representations in language models can serve as an unsupervised, pre-generation signal to identify answerable queries, especially in structured domains like math.
Contribution
It demonstrates that geometric deviation in hidden states can reliably signal answerability in mathematical prompts without labeled data or output access, outperforming simple baselines.
Findings
Geometry encodes task form, not answerability, across models.
Unanswerable math inputs show strong geometric deviation, enabling reliable detection.
The signal arises early in model layers and diminishes toward output.
Abstract
A reliable language model should be able to signal, prior to generation, when a query falls outside its knowledge. We investigate whether representation geometry can provide such a pre-generation signal by measuring the deviation of hidden states from an answerable reference set, requiring no labeled failure data and no access to model outputs. Across three instruction-tuned models (Llama 3.1-8B, Qwen 2.5-7B, and Mistral-7B-Instruct) and three prompt forms (Math, Fact, Code), we find that geometry primarily encodes task form. Within mathematical prompts, unanswerable inputs consistently deviate from the answerable centroid, yielding strong separation (ROC-AUC 0.78-0.84). This single-pass pre-generation signal outperforms a simple refusal baseline and compares favorably to self-consistency. It also captures cases where models do not explicitly refuse. In contrast, no reliable geometric…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
