Uncovering Entity Identity Confusion in Multimodal Knowledge Editing

Shu Wu; Xiaotian Ye; Xinyu Mou; Dongsheng Liu; Xiaohan Wang; Mengqi Zhang

arXiv:2605.06096·cs.CL·May 8, 2026

Uncovering Entity Identity Confusion in Multimodal Knowledge Editing

Shu Wu, Xiaotian Ye, Xinyu Mou, Dongsheng Liu, Xiaohan Wang, Mengqi Zhang

PDF

TL;DR

This paper investigates a systemic failure in multimodal knowledge editing models called Entity Identity Confusion, revealing how models confuse original and new entities after editing, and proposes strategies to mitigate this issue.

Contribution

The paper identifies Entity Identity Confusion as a key failure in multimodal knowledge editing and introduces EC-Bench for diagnosis, offering insights and mitigation strategies.

Findings

01

EIC causes models to confuse original and new entities after editing.

02

Existing methods fail to distinguish between I-E binding and E-E relations.

03

Constraining edits at the I-E stage reduces EIC significantly.

Abstract

Multimodal knowledge editing (MKE) aims to correct the internal knowledge of large vision-language models after deployment, yet the behavioral patterns of post-edit models remain underexplored. In this paper, we identify a systemic failure mode in edited models, termed Entity Identity Confusion (EIC): edited models exhibit an absurd behavior where text-only queries about the original entity's identity unexpectedly return information about the new entity. To rigorously investigate EIC, we construct EC-Bench, a diagnostic benchmark that directly probes how image-entity bindings shift before and after editing. Our analysis reveals that EIC stems from existing methods failing to distinguish between Image-Entity (I-E) binding and Entity-Entity (E-E) relational knowledge in the model, causing models to overfit E-E associations as a shortcut: the image is still perceived as the original…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.