ACE-$M^3$: Automatic Capability Evaluator for Multimodal Medical Models

Xiechi Zhang; Shunfan Zheng; Linlin Wang; Gerard de Melo; Zhu Cao,; Xiaoling Wang; Liang He

arXiv:2412.11453·cs.CL·December 17, 2024

ACE-$M^3$: Automatic Capability Evaluator for Multimodal Medical Models

Xiechi Zhang, Shunfan Zheng, Linlin Wang, Gerard de Melo, Zhu Cao,, Xiaoling Wang, Liang He

PDF

Open Access

TL;DR

ACE-$M^3$ is an open-source, automated evaluator designed specifically for assessing the question answering capabilities of multimodal medical models, combining detailed analysis with efficient training strategies.

Contribution

It introduces a novel branch-merge architecture and RTDPO strategy for effective, scalable evaluation of medical multimodal models.

Findings

01

Demonstrates high effectiveness in evaluating medical MLLMs

02

Outperforms traditional metrics like ROUGE and BLEU

03

Reduces training time without sacrificing performance

Abstract

As multimodal large language models (MLLMs) gain prominence in the medical field, the need for precise evaluation methods to assess their effectiveness has become critical. While benchmarks provide a reliable means to evaluate the capabilities of MLLMs, traditional metrics like ROUGE and BLEU employed for open domain evaluation only focus on token overlap and may not align with human judgment. Although human evaluation is more reliable, it is labor-intensive, costly, and not scalable. LLM-based evaluation methods have proven promising, but to date, there is still an urgent need for open-source multimodal LLM-based evaluators in the medical field. To address this issue, we introduce ACE- $M^{3}$ , an open-sourced \textbf{A}utomatic \textbf{C}apability \textbf{E}valuator for \textbf{M}ultimodal \textbf{M}edical \textbf{M}odels specifically designed to assess the question answering abilities…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Electronic Health Records Systems

MethodsFocus · ALIGN