From Prerequisites to Predictions: Validating a Geometric Hallucination Taxonomy Through Controlled Induction
Matic Korun

TL;DR
This study validates a geometric hallucination taxonomy in GPT-2, showing coverage gaps as the most distinctive failure mode through controlled experiments and statistical analysis.
Contribution
It introduces a controlled induction method to distinguish hallucination types in language models, confirming coverage gaps as a key failure mode.
Findings
Coverage-gap hallucinations are the most geometrically distinctive failure mode.
Type~3 norm separation is robust in static embeddings.
Types~1 and~2 do not separate in either embedding space.
Abstract
We test whether a geometric hallucination taxonomy -- classifying failures as center-drift (Type~1), wrong-well convergence (Type~2), or coverage gaps (Type~3) -- can distinguish hallucination types through controlled induction in GPT-2. Using a two-level statistical design with prompts (/group) as the unit of inference, we run each experiment 20 times with different generation seeds to quantify result stability. In static embeddings, Type~3 norm separation is robust (significant in 18/20 runs, Holm-corrected in 14/20, median ). In contextual hidden states, the Type~3 norm effect direction is stable (19/20 runs) but underpowered at (significant in 4/20, median ). Types~1 and~2 do not separate in either space ( runs). Token-level tests inflate significance by 4--16 through pseudoreplication -- a finding replicated across all 20…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputability, Logic, AI Algorithms · Functional Brain Connectivity Studies · Schizophrenia research and treatment
