Where Knowledge Collides: A Mechanistic Study of Intra-Memory Knowledge Conflict in Language Models

Minh Vu Pham; Hsuvas Borkakoty; Yufang Hou

arXiv:2601.09445·cs.CL·January 15, 2026

Where Knowledge Collides: A Mechanistic Study of Intra-Memory Knowledge Conflict in Language Models

Minh Vu Pham, Hsuvas Borkakoty, Yufang Hou

PDF

Open Access

TL;DR

This paper investigates how conflicting knowledge about the same event is internally encoded in language models, using interpretability techniques to locate and control these conflicts during inference.

Contribution

It introduces a framework based on mechanistic interpretability to identify and intervene in internal representations responsible for knowledge conflicts in language models.

Findings

01

Internal components encode conflicting knowledge from pre-training.

02

Mechanistic interpretability enables causal intervention in conflicts.

03

Framework helps localize and control knowledge conflicts during inference.

Abstract

In language models (LMs), intra-memory knowledge conflict largely arises when inconsistent information about the same event is encoded within the model's parametric knowledge. While prior work has primarily focused on resolving conflicts between a model's internal knowledge and external resources through approaches such as fine-tuning or knowledge editing, the problem of localizing conflicts that originate during pre-training within the model's internal representations remain unexplored. In this work, we design a framework based on mechanistic interpretability methods to identify where and how conflicting knowledge from the pre-training data is encoded within LMs. Our findings contribute to a growing body of evidence that specific internal components of a language model are responsible for encoding conflicting knowledge from pre-training, and we demonstrate how mechanistic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Topic Modeling