In Praise of Stubbornness: An Empirical Case for Cognitive-Dissonance Aware Continual Update of Knowledge in LLMs

Simone Clemente; Zied Ben Houidi; Alexis Huet; Dario Rossi; Giulio Franzese; Pietro Michiardi

arXiv:2502.04390·cs.CL·June 11, 2025

In Praise of Stubbornness: An Empirical Case for Cognitive-Dissonance Aware Continual Update of Knowledge in LLMs

Simone Clemente, Zied Ben Houidi, Alexis Huet, Dario Rossi, Giulio Franzese, Pietro Michiardi

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper empirically demonstrates that large language models are highly susceptible to catastrophic knowledge interference when learning contradictory facts, highlighting the need for architectures that resist contradictions like humans do.

Contribution

It reveals the fundamental limitation of LLMs in handling contradictory information and proposes a simple detection method, suggesting new architectures for better knowledge retention.

Findings

01

Contradictory updates cause up to 80% loss of unrelated knowledge.

02

Selective targeting of frequently-used neurons improves non-contradictory knowledge retention.

03

Contradictory information detection achieves over 95% accuracy.

Abstract

Through systematic empirical investigation, we uncover a fundamental and concerning property of Large Language Models: while they can safely learn facts that don't contradict their knowledge, attempting to update facts with contradictory information triggers catastrophic corruption of unrelated knowledge. Unlike humans, who naturally resist contradictory information, these models indiscriminately accept contradictions, leading to devastating interference, destroying up to 80% of unrelated knowledge even when learning as few as 10-100 contradicting facts. To understand whether this interference could be mitigated through selective plasticity, we experiment with targeted network updates, distinguishing between previously used (stubborn) and rarely used (plastic) neurons. We uncover another asymmetry: while sparing frequently-used neurons significantly improves retention of existing…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 5

Strengths

1. **Clear Problem Formulation:** The paper is well-motivated, using the intuitive analogy of cognitive dissonance to frame a critical problem in continual learning for LLMs. The distinction between dissonant and non-dissonant updates is a useful lens for analysis. 2. **Rigorous Empirical Work:** The experiments are thoughtfully designed and clearly presented. The findings, particularly the stark contrast in how the model handles the two update types (Figure 3) and the asymmetry of selective

Weaknesses

However, the paper’s novelty and core contributions are severely undermined by its substantial overlap with the recent work of Sun et al. (ICLR 2025), “How new data permeates LLM knowledge and how to dilute it,” which the authors fail to cite or discuss. The framework introduced by Sun et al. provides a more general explanation for the phenomena observed here, suggesting that the effects described in this paper are a specific, less general instance of the “priming” and “surprisal” mechanisms ide

Reviewer 02Rating 6Confidence 4

Strengths

1. This paper offers a hope for prevention of catastrophic interference so that it could have direct relevance to LLM deployment, particularly for models that undergo continual updates in production. 2. The method proposed in the paper for identifying stubborn and plastic neurons is straightforward and reasonable. Remarkably, the approach introduced in the paper aligns with the lottery ticket hypothesis in several respects and offers a novel possibility for discovering sparse subnetworks.

Weaknesses

1. The experiments are restricted to the GPT family (GPT-2, GPT-J, GPT-4.1). The generalizability of findings to other model architectures (e.g., LLaMA, Mistral, T5, BERT-style models) remains unclear. Different architectures may handle knowledge storage and updates differently, which could significantly affect the conclusions. 2. All experiments rely solely on CounterFact dataset. The findings may be specific to the types of factual knowledge and counterfactual patterns in this dataset.

Reviewer 03Rating 2Confidence 4

Strengths

1. The impact of dissonant updates on unrelated knowledge is an interesting phenomenon. 2. The related work has a good overview of related works even as far back as a few decades. 3. The results on finetuning GPT-4.1 help establish that they hold even with newer models

Weaknesses

1. The entire argument on selective plasticity and the connection to the human brain is very tenuous and anthropomorphizing. There is no rigorous analysis of why the learning of LLMs should be like the brain at all. There are much simpler explanations that must be investigated before these claims, such as that training an LM on contradicting facts teaches it to just output the opposite of what is true, similar to “emergent misalignment” [1]. 2. The paper sets up the problem as the language model

Code & Models

Repositories

bendiogene/ConflictAwareLLM
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Advanced Database Systems and Queries · Scientific Computing and Data Management