MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation

Gurucharan Marthi Krishna Kumar; Aman Chadha; Janine Mendola; Amir Shmuel

arXiv:2410.02458·eess.IV·August 20, 2025

MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation

Gurucharan Marthi Krishna Kumar, Aman Chadha, Janine Mendola, Amir Shmuel

PDF

Open Access

TL;DR

This paper introduces MedVisionLlama, a novel approach that integrates pre-trained Large Language Model transformer layers into Vision Transformers to significantly improve medical image segmentation accuracy across various modalities.

Contribution

It proposes a hybrid model combining frozen LLM transformer blocks with Vision Transformers, along with a new attention mechanism and multi-scale fusion for enhanced segmentation performance.

Findings

01

Dice score increased from 0.74 to 0.79

02

Significant improvements in accuracy and precision

03

Effective integration of LLMs into vision models

Abstract

Large Language Models (LLMs), known for their versatility in textual data, are increasingly being explored for their potential to enhance medical image segmentation, a crucial task for accurate diagnostic imaging. This study explores enhancing Vision Transformers (ViTs) for medical image segmentation by integrating pre-trained LLM transformer blocks. Our approach, which incorporates a frozen LLM transformer block into the encoder of a ViT-based model, leads to substantial improvements in segmentation performance across various medical imaging modalities. We propose a Hybrid Attention Mechanism that combines global and local feature learning with a Multi-Scale Fusion Block for aggregating features across different scales. The enhanced model shows significant performance gains, including an average Dice score increase from 0.74 to 0.79 and improvements in accuracy, precision, and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRadiomics and Machine Learning in Medical Imaging · Topic Modeling · COVID-19 diagnosis using AI

MethodsSoftmax · Attention Is All You Need