Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models
Zhen Zeng, Leijiang Gu, Xun Yang, Zhangling Duan, Zenglin Shi, Meng, Wang

TL;DR
This paper introduces a new fine-grained multimodal knowledge editing task for visual and textual information, along with a benchmark and a novel framework that improves editing precision in images with multiple entities.
Contribution
It proposes the FGVEdit benchmark and the MSCKE framework, enabling precise, entity-specific knowledge editing in multimodal models, addressing limitations of previous text-only methods.
Findings
MSCKE outperforms existing methods on FGVEdit
Demonstrates effective precise editing in complex multimodal contexts
Addresses challenges of multimodal knowledge editing with a novel classifier-based approach
Abstract
Knowledge editing aims to efficiently and cost-effectively correct inaccuracies and update outdated information. Recently, there has been growing interest in extending knowledge editing from Large Language Models (LLMs) to Multimodal Large Language Models (MLLMs), which integrate both textual and visual information, introducing additional editing complexities. Existing multimodal knowledge editing works primarily focus on text-oriented, coarse-grained scenarios, failing to address the unique challenges posed by multimodal contexts. In this paper, we propose a visual-oriented, fine-grained multimodal knowledge editing task that targets precise editing in images with multiple interacting entities. We introduce the Fine-Grained Visual Knowledge Editing (FGVEdit) benchmark to evaluate this task. Moreover, we propose a Multimodal Scope Classifier-based Knowledge Editor (MSCKE) framework.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsFocus
