MultiMedEdit: A Scenario-Aware Benchmark for Evaluating Knowledge Editing in Medical VQA

Shengtao Wen; Haodong Chen; Yadong Wang; Zhongying Pan; Xiang Chen; Yu Tian; Bo Qian; Dong Liang; Sheng-Jun Huang

arXiv:2508.07022·cs.AI·August 12, 2025

MultiMedEdit: A Scenario-Aware Benchmark for Evaluating Knowledge Editing in Medical VQA

Shengtao Wen, Haodong Chen, Yadong Wang, Zhongying Pan, Xiang Chen, Yu Tian, Bo Qian, Dong Liang, Sheng-Jun Huang

PDF

Open Access

TL;DR

MultiMedEdit introduces a comprehensive benchmark for evaluating knowledge editing in multimodal medical visual question answering, addressing the unique challenges of integrating updated knowledge with visual reasoning in clinical scenarios.

Contribution

It is the first benchmark specifically designed for knowledge editing in clinical multimodal tasks, including a new metric suite and extensive experimental analysis.

Findings

01

Current methods struggle with generalization and long-tail reasoning.

02

Significant practical trade-offs exist in edit latency and memory footprint.

03

The benchmark reveals limitations of existing approaches in complex clinical workflows.

Abstract

Knowledge editing (KE) provides a scalable approach for updating factual knowledge in large language models without full retraining. While previous studies have demonstrated effectiveness in general domains and medical QA tasks, little attention has been paid to KE in multimodal medical scenarios. Unlike text-only settings, medical KE demands integrating updated knowledge with visual reasoning to support safe and interpretable clinical decisions. To address this gap, we propose MultiMedEdit, the first benchmark tailored to evaluating KE in clinical multimodal tasks. Our framework spans both understanding and reasoning task types, defines a three-dimensional metric suite (reliability, generality, and locality), and supports cross-paradigm comparisons across general and domain-specific models. We conduct extensive experiments under single-editing and lifelong-editing settings. Results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Multimodal Machine Learning Applications