Predictive Coding and Information Bottleneck for Hallucination Detection in Large Language Models
Manish Bhatt

TL;DR
This paper introduces a lightweight, interpretable detection framework for hallucinations in large language models, leveraging neuroscience-inspired signals to improve accuracy and efficiency over existing methods.
Contribution
It presents a novel hybrid detection model combining Predictive Coding and Information Bottleneck signals, achieving high performance with significantly less data and faster inference.
Findings
Achieves 0.8669 AUROC on HaluBench with improved features
Outperforms existing methods in data efficiency and speed
Identifies limitations of Rationalization signals in hallucination detection
Abstract
Hallucinations in Large Language Models (LLMs) -- generations that are plausible but factually unfaithful -- remain a critical barrier to high-stakes deployment. Current detection methods typically rely on computationally expensive external retrieval loops or opaque black-box LLM judges requiring 70B+ parameters. In this work, we introduce [Model Name], a hybrid detection framework that combines neuroscience-inspired signal design with supervised machine learning. We extract interpretable signals grounded in Predictive Coding (quantifying surprise against internal priors) and the Information Bottleneck (measuring signal retention under perturbation). Through systematic ablation, we demonstrate three key enhancements: Entity-Focused Uptake (concentrating on high-value tokens), Context Adherence (measuring grounding strength), and Falsifiability Score (detecting confident but…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning in Healthcare · Adversarial Robustness in Machine Learning
