SymLoc: Symbolic Localization of Hallucination across HaluEval and TruthfulQA
Naveen Lamba, Sanju Tiwari, Manas Gaur

TL;DR
This paper introduces SymLoc, a symbolic localization framework that uses linguistic knowledge to identify where hallucinations originate in LLMs, revealing early-layer failures in processing symbolic triggers.
Contribution
It is the first approach to leverage symbolic linguistic knowledge for localizing hallucinations across model layers, providing new insights into the symbolic processing failures in LLMs.
Findings
Attention variance explodes in early layers for symbolic triggers
Hallucination rates remain high despite larger models
Symbolic semantic processing breaks down early in model layers
Abstract
LLMs still struggle with hallucination, especially when confronted with symbolic triggers like modifiers, negation, numbers, exceptions, and named entities. Yet, we lack a clear understanding of where these symbolic hallucinations originate, making it crucial to systematically handle such triggers and localize the emergence of hallucination inside the model. While prior work explored localization using statistical techniques like LSC and activation variance analysis, these methods treat all tokens equally and overlook the role symbolic linguistic knowledge plays in triggering hallucinations. So far, no approach has investigated how symbolic elements specifically drive hallucination failures across model layers, nor has symbolic linguistic knowledge been used as the foundation for a localization framework. We propose the first symbolic localization framework that leverages symbolic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeurobiology of Language and Bilingualism · Topic Modeling · Text Readability and Simplification
