Parameter-Efficient Fine-Tuning Medical Multimodal Large Language Models   for Medical Visual Grounding

Jinlong He; Pengfei Li; Gang Liu; Shenjun Zhong

arXiv:2410.23822·cs.CV·November 1, 2024

Parameter-Efficient Fine-Tuning Medical Multimodal Large Language Models for Medical Visual Grounding

Jinlong He, Pengfei Li, Gang Liu, Shenjun Zhong

PDF

Open Access 1 Models

TL;DR

This paper introduces PFMVG, a parameter-efficient fine-tuning approach for medical multimodal large language models tailored for visual grounding tasks in medical images, addressing data and cost challenges.

Contribution

It presents a novel fine-tuning method specifically designed for medical visual grounding, outperforming existing models like GPT-4v on benchmark datasets.

Findings

01

Achieves competitive results on medical visual grounding benchmarks.

02

Significantly outperforms GPT-4v in the same task.

03

Demonstrates effectiveness of parameter-efficient fine-tuning in medical multimodal models.

Abstract

Multimodal Large Language Models (MLLMs) inherit the superior text understanding capabilities of LLMs and extend these capabilities to multimodal scenarios. These models achieve excellent results in the general domain of multimodal tasks. However, in the medical domain, the substantial training costs and the requirement for extensive medical data pose challenges to the development of medical MLLMs. Furthermore, due to the free-text form of answers, tasks such as visual grounding that need to produce output in a prescribed form become difficult for MLLMs. So far, there have been no medical MLLMs works in medical visual grounding area. For the medical vision grounding task, which involves identifying locations in medical images based on short text descriptions, we propose Parameter-efficient Fine-tuning medical multimodal large language models for Medcial Visual Grounding (PFMVG). To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
linhuixiao/Awesome-Visual-Grounding
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Multimodal Machine Learning Applications · AI in cancer detection