MMedAgent: Learning to Use Medical Tools with Multi-modal Agent
Binxu Li, Tiankai Yan, Yuanting Pan, Jie Luo, Ruiyang Ji, Jiayuan, Ding, Zhe Xu, Shilong Liu, Haoyu Dong, Zihao Lin, Yixin Wang

TL;DR
This paper introduces MMedAgent, a multi-modal medical agent that intelligently selects and utilizes specialized medical tools across various tasks and modalities, outperforming existing models and demonstrating adaptability.
Contribution
The paper presents the first medical-specific multi-modal agent capable of tool selection and integration, supported by a curated instruction-tuning dataset for diverse medical tasks.
Findings
MMedAgent outperforms state-of-the-art open-source models and GPT-4o on medical tasks.
The agent efficiently updates and incorporates new medical tools.
Demonstrates superior generality and adaptability in medical multi-modal tasks.
Abstract
Multi-Modal Large Language Models (MLLMs), despite being successful, exhibit limited generality and often fall short when compared to specialized models. Recently, LLM-based agents have been developed to address these challenges by selecting appropriate specialized models as tools based on user inputs. However, such advancements have not been extensively explored within the medical domain. To bridge this gap, this paper introduces the first agent explicitly designed for the medical field, named \textbf{M}ulti-modal \textbf{Med}ical \textbf{Agent} (MMedAgent). We curate an instruction-tuning dataset comprising six medical tools solving seven tasks across five modalities, enabling the agent to choose the most suitable tools for a given task. Comprehensive experiments demonstrate that MMedAgent achieves superior performance across a variety of medical tasks compared to state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMulti-Agent Systems and Negotiation · Speech and dialogue systems · Natural Language Processing Techniques
