MedMax: Mixed-Modal Instruction Tuning for Training Biomedical   Assistants

Hritik Bansal; Daniel Israel; Siyan Zhao; Shufan Li; Tung Nguyen,; Aditya Grover

arXiv:2412.12661·cs.AI·April 24, 2025

MedMax: Mixed-Modal Instruction Tuning for Training Biomedical Assistants

Hritik Bansal, Daniel Israel, Siyan Zhao, Shufan Li, Tung Nguyen,, Aditya Grover

PDF

Open Access 1 Repo 2 Datasets 1 Video

TL;DR

MedMax introduces a large-scale, diverse multimodal dataset for biomedical instruction tuning, significantly enhancing the performance of foundation models in biomedical visual question answering and related tasks.

Contribution

We created MedMax, a comprehensive multimodal biomedical dataset, and demonstrated its effectiveness in fine-tuning models for improved biomedical AI assistance.

Findings

01

26% performance improvement over Chameleon

02

18.3% improvement over GPT-4o in biomedical VQA

03

Diverse tasks across biomedical domains

Abstract

Recent advancements in mixed-modal generative have opened new avenues for developing unified biomedical assistants capable of analyzing biomedical images, answering complex questions about them, and generating multimodal patient reports. However, existing datasets face challenges such as small sizes, limited coverage of biomedical tasks and domains, and a reliance on narrow sources. To address these gaps, we present MedMax, a large-scale multimodal biomedical instruction-tuning dataset for mixed-modal foundation models. With 1.47 million instances, MedMax encompasses a diverse range of tasks, including interleaved image-text generation, biomedical image captioning and generation, visual chat, and report understanding. These tasks span knowledge across diverse biomedical domains, including radiology and histopathology, grounded in medical papers and YouTube videos. Subsequently, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Hritikbansal/medmax
jaxOfficial

Datasets

Videos

MedMax: Mixed-Modal Instruction Tuning for Training Biomedical Assistants· slideslive

Taxonomy

TopicsAssistive Technology in Communication and Mobility