Diagnosing Knowledge Conflict in Multimodal Long-Chain Reasoning

Jing Tang; Kun Wang; Haolang Lu; Hongjin Chen; KaiTao Chen; Zhongxiang Sun; Qiankun Li; Lingjuan Lyu; Guoshun Nan; Zhigang Zeng

arXiv:2602.14518·cs.AI·February 17, 2026

Diagnosing Knowledge Conflict in Multimodal Long-Chain Reasoning

Jing Tang, Kun Wang, Haolang Lu, Hongjin Chen, KaiTao Chen, Zhongxiang Sun, Qiankun Li, Lingjuan Lyu, Guoshun Nan, Zhigang Zeng

PDF

Open Access

TL;DR

This paper investigates how multimodal large language models handle conflicting knowledge signals during long-chain reasoning, revealing their internal conflict encoding and proposing diagnostic insights.

Contribution

It formalizes knowledge conflicts in multimodal reasoning, uncovers their internal representations, and offers mechanisms for diagnosis and control of model failures.

Findings

01

Conflict types are linearly separable in internal representations

02

Conflict signals are concentrated in mid-to-late layers

03

Aggregating token signals recovers input-level conflict types

Abstract

Multimodal large language models (MLLMs) in long chain-of-thought reasoning often fail when different knowledge sources provide conflicting signals. We formalize these failures under a unified notion of knowledge conflict, distinguishing input-level objective conflict from process-level effective conflict. Through probing internal representations, we reveal that: (I) Linear Separability: different conflict types are explicitly encoded as linearly separable features rather than entangled; (II) Depth Localization: conflict signals concentrate in mid-to-late layers, indicating a distinct processing stage for conflict encoding; (III) Hierarchical Consistency: aggregating noisy token-level signals along trajectories robustly recovers input-level conflict types; and (IV) Directional Asymmetry: reinforcing the model's implicit source preference under conflict is far easier than enforcing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Multimodal Machine Learning Applications · Logic, Reasoning, and Knowledge