Scaling Up Biomedical Vision-Language Models: Fine-Tuning, Instruction Tuning, and Multi-Modal Learning

Cheng Peng; Kai Zhang; Mengxian Lyu; Hongfang Liu; Lichao Sun; Yonghui Wu

arXiv:2505.17436·cs.AI·May 26, 2025

Scaling Up Biomedical Vision-Language Models: Fine-Tuning, Instruction Tuning, and Multi-Modal Learning

Cheng Peng, Kai Zhang, Mengxian Lyu, Hongfang Liu, Lichao Sun, Yonghui Wu

PDF

1 Repo

TL;DR

This paper presents the development and evaluation of large-scale biomedical vision-language models, BiomedGPT-Large and XL, demonstrating improved performance across diverse multi-modal biomedical tasks through fine-tuning and instruction tuning.

Contribution

Introduces two scaled biomedical vision-language models with extensive fine-tuning and instruction tuning, enhancing multi-modal biomedical task performance and zero-shot learning capabilities.

Findings

01

BiomedGPT-Large and XL outperform previous models on benchmark datasets.

02

Instruction tuning improves zero-shot learning performance.

03

Models effectively handle diverse biomedical multi-modal tasks.

Abstract

To advance biomedical vison-language model capabilities through scaling up, fine-tuning, and instruction tuning, develop vision-language models with improved performance in handling long text, explore strategies to efficiently adopt vision language models for diverse multi-modal biomedical tasks, and examine the zero-shot learning performance. We developed two biomedical vision language models, BiomedGPT-Large and BiomedGPT-XLarge, based on an encoder-decoder-based transformer architecture. We fine-tuned the two models on 23 benchmark datasets from 6 multi-modal biomedical tasks including one image-only task (image classification), three language-only tasks (text understanding, text summarization and question answering), and two vision-language tasks (visual question answering and image captioning). We compared the developed scaled models with our previous BiomedGPT-Base model and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

taokz/biomedgpt
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsADaptive gradient method with the OPTimal convergence rate