Benchmarking Multimodal Knowledge Conflict for Large Multimodal Models

Yifan Jia; Kailin Jiang; Yuyang Liang; Qihan Ren; Yi Xin; Rui Yang; Fenze Feng; Mingcai Chen; Hengyang Lu; Haozhe Wang; Xiaoye Qu; Dongrui Liu; Lizhen Cui; Yuntao Du

arXiv:2505.19509·cs.LG·May 27, 2025

Benchmarking Multimodal Knowledge Conflict for Large Multimodal Models

Yifan Jia, Kailin Jiang, Yuyang Liang, Qihan Ren, Yi Xin, Rui Yang, Fenze Feng, Mingcai Chen, Hengyang Lu, Haozhe Wang, Xiaoye Qu, Dongrui Liu, Lizhen Cui, Yuntao Du

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces MMKC-Bench, a comprehensive benchmark for evaluating how large multimodal models handle factual knowledge conflicts across different scenarios, highlighting current models' tendencies and gaps in conflict detection.

Contribution

The paper presents MMKC-Bench, a new benchmark with diverse conflict types and a large dataset, to evaluate and improve multimodal knowledge conflict detection in large models.

Findings

01

Current LMMs recognize conflicts but favor internal knowledge.

02

Models struggle with external evidence in conflict scenarios.

03

MMKC-Bench reveals gaps in conflict detection capabilities.

Abstract

Large Multimodal Models(LMMs) face notable challenges when encountering multimodal knowledge conflicts, particularly under retrieval-augmented generation(RAG) frameworks where the contextual information from external sources may contradict the model's internal parametric knowledge, leading to unreliable outputs. However, existing benchmarks fail to reflect such realistic conflict scenarios. Most focus solely on intra-memory conflicts, while context-memory and inter-context conflicts remain largely investigated. Furthermore, commonly used factual knowledge-based evaluations are often overlooked, and existing datasets lack a thorough investigation into conflict detection capabilities. To bridge this gap, we propose MMKC-Bench, a benchmark designed to evaluate factual knowledge conflicts in both context-memory and inter-context scenarios. MMKC-Bench encompasses three types of multimodal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mllmkcbench/mllmkc
pytorchOfficial

Videos

Benchmarking Multimodal Knowledge Conflict for Large Multimodal Models· underline

Taxonomy

TopicsMulti-Agent Systems and Negotiation

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Attention Dropout · Softmax · WordPiece · Weight Decay · Multi-Head Attention · Layer Normalization · Byte Pair Encoding