Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large   Language Models

Zhen Zeng; Leijiang Gu; Xun Yang; Zhangling Duan; Zenglin Shi; Meng; Wang

arXiv:2411.12790·cs.CV·November 21, 2024

Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models

Zhen Zeng, Leijiang Gu, Xun Yang, Zhangling Duan, Zenglin Shi, Meng, Wang

PDF

Open Access 1 Datasets

TL;DR

This paper introduces a new fine-grained multimodal knowledge editing task for visual and textual information, along with a benchmark and a novel framework that improves editing precision in images with multiple entities.

Contribution

It proposes the FGVEdit benchmark and the MSCKE framework, enabling precise, entity-specific knowledge editing in multimodal models, addressing limitations of previous text-only methods.

Findings

01

MSCKE outperforms existing methods on FGVEdit

02

Demonstrates effective precise editing in complex multimodal contexts

03

Addresses challenges of multimodal knowledge editing with a novel classifier-based approach

Abstract

Knowledge editing aims to efficiently and cost-effectively correct inaccuracies and update outdated information. Recently, there has been growing interest in extending knowledge editing from Large Language Models (LLMs) to Multimodal Large Language Models (MLLMs), which integrate both textual and visual information, introducing additional editing complexities. Existing multimodal knowledge editing works primarily focus on text-oriented, coarse-grained scenarios, failing to address the unique challenges posed by multimodal contexts. In this paper, we propose a visual-oriented, fine-grained multimodal knowledge editing task that targets precise editing in images with multiple interacting entities. We introduce the Fine-Grained Visual Knowledge Editing (FGVEdit) benchmark to evaluate this task. Moreover, we propose a Multimodal Scope Classifier-based Knowledge Editor (MSCKE) framework.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

ZhenZeng/FGVEdit
dataset· 19 dl
19 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsFocus