TL;DR
MedMKEB introduces a comprehensive benchmark for evaluating medical multimodal large language models' ability to reliably and efficiently update medical knowledge across image and text modalities, addressing a critical gap in current evaluation standards.
Contribution
This paper presents MedMKEB, the first systematic benchmark for multimodal medical knowledge editing, including diverse tasks and expert validation, to advance trustworthy medical AI models.
Findings
Existing editing methods have limitations in medical contexts.
The benchmark reveals challenges in generality and robustness of current models.
Expert validation confirms the benchmark's reliability.
Abstract
Recent advances in multimodal large language models (MLLMs) have significantly improved medical AI, enabling it to unify the understanding of visual and textual information. However, as medical knowledge continues to evolve, it is critical to allow these models to efficiently update outdated or incorrect information without retraining from scratch. Although textual knowledge editing has been widely studied, there is still a lack of systematic benchmarks for multimodal medical knowledge editing involving image and text modalities. To fill this gap, we present MedMKEB, the first comprehensive benchmark designed to evaluate the reliability, generality, locality, portability, and robustness of knowledge editing in medical multimodal large language models. MedMKEB is built on a high-quality medical visual question-answering dataset and enriched with carefully constructed editing tasks,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
