A Refer-and-Ground Multimodal Large Language Model for Biomedicine

Xiaoshuang Huang; Haifeng Huang; Lingdong Shen; Yehui Yang; Fangxin; Shang; Junwei Liu; Jia Liu

arXiv:2406.18146·cs.CV·July 1, 2024

A Refer-and-Ground Multimodal Large Language Model for Biomedicine

Xiaoshuang Huang, Haifeng Huang, Lingdong Shen, Yehui Yang, Fangxin, Shang, Junwei Liu, Jia Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces Med-GRIT-270k, a large biomedical dataset for refer-and-ground multimodal tasks, and presents BiRD, a model leveraging this dataset to enhance interactive biomedical image understanding.

Contribution

It creates the first dedicated biomedical refer-and-ground dataset and develops BiRD, a multimodal LLM tailored for biomedical image referencing and grounding tasks.

Findings

01

Med-GRIT-270k enables effective training for biomedical refer-and-ground tasks.

02

BiRD demonstrates strong multimodal, fine-grained interactive capabilities.

03

The approach advances intelligent biomedical assistant development.

Abstract

With the rapid development of multimodal large language models (MLLMs), especially their capabilities in visual chat through refer and ground functionalities, their significance is increasingly recognized. However, the biomedical field currently exhibits a substantial gap in this area, primarily due to the absence of a dedicated refer and ground dataset for biomedical images. To address this challenge, we devised the Med-GRIT-270k dataset. It comprises 270k question-and-answer pairs and spans eight distinct medical imaging modalities. Most importantly, it is the first dedicated to the biomedical domain and integrating refer and ground conversations. The key idea is to sample large-scale biomedical image-mask pairs from medical segmentation datasets and generate instruction datasets from text using chatGPT. Additionally, we introduce a Refer-and-Ground Multimodal Large Language Model for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shawnhuang497/bird
paddleOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling