In Praise of Stubbornness: An Empirical Case for Cognitive-Dissonance Aware Continual Update of Knowledge in LLMs
Simone Clemente, Zied Ben Houidi, Alexis Huet, Dario Rossi, Giulio Franzese, Pietro Michiardi

TL;DR
This paper empirically demonstrates that large language models are highly susceptible to catastrophic knowledge interference when learning contradictory facts, highlighting the need for architectures that resist contradictions like humans do.
Contribution
It reveals the fundamental limitation of LLMs in handling contradictory information and proposes a simple detection method, suggesting new architectures for better knowledge retention.
Findings
Contradictory updates cause up to 80% loss of unrelated knowledge.
Selective targeting of frequently-used neurons improves non-contradictory knowledge retention.
Contradictory information detection achieves over 95% accuracy.
Abstract
Through systematic empirical investigation, we uncover a fundamental and concerning property of Large Language Models: while they can safely learn facts that don't contradict their knowledge, attempting to update facts with contradictory information triggers catastrophic corruption of unrelated knowledge. Unlike humans, who naturally resist contradictory information, these models indiscriminately accept contradictions, leading to devastating interference, destroying up to 80% of unrelated knowledge even when learning as few as 10-100 contradicting facts. To understand whether this interference could be mitigated through selective plasticity, we experiment with targeted network updates, distinguishing between previously used (stubborn) and rarely used (plastic) neurons. We uncover another asymmetry: while sparing frequently-used neurons significantly improves retention of existing…
Peer Reviews
Decision·Submitted to ICLR 2026
1. **Clear Problem Formulation:** The paper is well-motivated, using the intuitive analogy of cognitive dissonance to frame a critical problem in continual learning for LLMs. The distinction between dissonant and non-dissonant updates is a useful lens for analysis. 2. **Rigorous Empirical Work:** The experiments are thoughtfully designed and clearly presented. The findings, particularly the stark contrast in how the model handles the two update types (Figure 3) and the asymmetry of selective
However, the paper’s novelty and core contributions are severely undermined by its substantial overlap with the recent work of Sun et al. (ICLR 2025), “How new data permeates LLM knowledge and how to dilute it,” which the authors fail to cite or discuss. The framework introduced by Sun et al. provides a more general explanation for the phenomena observed here, suggesting that the effects described in this paper are a specific, less general instance of the “priming” and “surprisal” mechanisms ide
1. This paper offers a hope for prevention of catastrophic interference so that it could have direct relevance to LLM deployment, particularly for models that undergo continual updates in production. 2. The method proposed in the paper for identifying stubborn and plastic neurons is straightforward and reasonable. Remarkably, the approach introduced in the paper aligns with the lottery ticket hypothesis in several respects and offers a novel possibility for discovering sparse subnetworks.
1. The experiments are restricted to the GPT family (GPT-2, GPT-J, GPT-4.1). The generalizability of findings to other model architectures (e.g., LLaMA, Mistral, T5, BERT-style models) remains unclear. Different architectures may handle knowledge storage and updates differently, which could significantly affect the conclusions. 2. All experiments rely solely on CounterFact dataset. The findings may be specific to the types of factual knowledge and counterfactual patterns in this dataset.
1. The impact of dissonant updates on unrelated knowledge is an interesting phenomenon. 2. The related work has a good overview of related works even as far back as a few decades. 3. The results on finetuning GPT-4.1 help establish that they hold even with newer models
1. The entire argument on selective plasticity and the connection to the human brain is very tenuous and anthropomorphizing. There is no rigorous analysis of why the learning of LLMs should be like the brain at all. There are much simpler explanations that must be investigated before these claims, such as that training an LM on contradicting facts teaches it to just output the opposite of what is true, similar to “emergent misalignment” [1]. 2. The paper sets up the problem as the language model
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Advanced Database Systems and Queries · Scientific Computing and Data Management
