Multimodal Medical Disease Classification with LLaMA II

Christian Gapp; Elias Tappeiner; Martin Welk; Rainer Schubert

arXiv:2412.01306·cs.AI·September 11, 2025

Multimodal Medical Disease Classification with LLaMA II

Christian Gapp, Elias Tappeiner, Martin Welk, Rainer Schubert

PDF

TL;DR

This paper retrains a multimodal transformer model using LLaMA II to classify medical diseases by effectively fusing text and image data, achieving high accuracy on chest X-ray datasets.

Contribution

It introduces a novel multimodal architecture with early fusion techniques using LLaMA II, outperforming previous models on medical multimodal classification tasks.

Findings

01

Early fusion yields better results than late fusion.

02

Best model achieves 97.10% mean AUC.

03

Model outperforms previous classification models.

Abstract

Medical patient data is always multimodal. Images, text, age, gender, histopathological data are only few examples for different modalities in this context. Processing and integrating this multimodal data with deep learning based methods is of utmost interest due to its huge potential for medical procedure such as diagnosis and patient treatment planning. In this work we retrain a multimodal transformer-based model for disease classification. To this end we use the text-image pair dataset from OpenI consisting of 2D chest X-rays associated with clinical reports. Our focus is on fusion methods for merging text and vision information extracted from medical datasets. Different architecture structures with a LLaMA II backbone model are tested. Early fusion of modality specific features creates better results with the best model reaching 97.10% mean AUC than late fusion from a deeper level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLLaMA · Focus