Tracking the Limits of Knowledge Propagation: How LLMs Fail at Multi-Step Reasoning with Conflicting Knowledge

Yiyang Feng; Zeming Chen; Haotian Wu; Jiawei Zhou; Antoine Bosselut

arXiv:2601.15495·cs.AI·January 23, 2026

Tracking the Limits of Knowledge Propagation: How LLMs Fail at Multi-Step Reasoning with Conflicting Knowledge

Yiyang Feng, Zeming Chen, Haotian Wu, Jiawei Zhou, Antoine Bosselut

PDF

Open Access 1 Video

TL;DR

This paper introduces TRACK, a benchmark to evaluate how large language models handle conflicting knowledge during multi-step reasoning, revealing that updates can sometimes worsen model performance and highlighting challenges in knowledge integration.

Contribution

The paper presents a new benchmark, TRACK, for assessing LLMs' ability to propagate conflicting knowledge in multi-step reasoning scenarios, addressing a gap in existing evaluation methods.

Findings

01

Providing updated facts can worsen LLM performance.

02

Performance degradation increases with more conflicting facts.

03

Models struggle with faithful integration and reasoning over conflicting knowledge.

Abstract

A common solution for mitigating outdated or incorrect information in Large Language Models (LLMs) is to provide updated facts in-context or through knowledge editing. However, these methods introduce knowledge conflicts when the knowledge update fails to overwrite the model's parametric knowledge, which propagate to faulty reasoning. Current benchmarks for this problem, however, largely focus only on single knowledge updates and fact recall without evaluating how these updates affect downstream reasoning. In this work, we introduce TRACK (Testing Reasoning Amid Conflicting Knowledge), a new benchmark for studying how LLMs propagate new knowledge through multi-step reasoning when it conflicts with the model's initial parametric knowledge. Spanning three reasoning-intensive scenarios (WIKI, CODE, and MATH), TRACK introduces multiple, realistic conflicts to mirror real-world complexity.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Tracking the Limits of Knowledge Propagation: How LLMs Fail at Multi-Step Reasoning with Conflicting Knowledge· underline

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks · Natural Language Processing Techniques