Advancing High Resolution Vision-Language Models in Biomedicine

Zekai Chen; Arda Pekis; Kevin Brown

arXiv:2406.09454·cs.CL·June 17, 2024·1 cites

Advancing High Resolution Vision-Language Models in Biomedicine

Zekai Chen, Arda Pekis, Kevin Brown

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new biomedical vision-language model with a specialized dataset and hierarchical image encoding, achieving state-of-the-art zero-shot performance in biomedical visual question answering.

Contribution

It presents a novel biomedical instruction dataset, a hierarchical image encoding strategy, and the Llama3-Med model with improved zero-shot accuracy.

Findings

01

Achieved over 10% performance improvement on biomedical VQA benchmarks.

02

Developed a new dataset with medical image-text pairs from Claude3-Opus and LLaMA3 70B.

03

Enhanced fine-grained visual understanding with hierarchical image representations.

Abstract

Multi-modal learning has significantly advanced generative AI, especially in vision-language modeling. Innovations like GPT-4V and open-source projects such as LLaVA have enabled robust conversational agents capable of zero-shot task completions. However, applying these technologies in the biomedical field presents unique challenges. Recent initiatives like LLaVA-Med have started to adapt instruction-tuning for biomedical contexts using large datasets such as PMC-15M. Our research offers three key contributions: (i) we present a new instruct dataset enriched with medical image-text pairs from Claude3-Opus and LLaMA3 70B, (ii) we propose a novel image encoding strategy using hierarchical representations to improve fine-grained biomedical visual comprehension, and (iii) we develop the Llama3-Med model, which achieves state-of-the-art zero-shot performance on biomedical visual question…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

standardmodelbio/llama3-med
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies