Scaling Down to Scale Up: Towards Operationally-Efficient and Deployable Clinical Models via Cross-Modal Low-Rank Adaptation for Medical Vision-Language Models

Thuraya Alzubaidi; Farhad R. Nezami; Muzammil Behzad

arXiv:2512.00597·cs.CV·December 2, 2025

Scaling Down to Scale Up: Towards Operationally-Efficient and Deployable Clinical Models via Cross-Modal Low-Rank Adaptation for Medical Vision-Language Models

Thuraya Alzubaidi, Farhad R. Nezami, Muzammil Behzad

PDF

Open Access

TL;DR

This paper presents MedCT-VLM, a parameter-efficient vision-language model for medical CT imaging, which adapts large-scale foundation models to clinical tasks using low-rank adaptation, achieving significant improvements in zero-shot pathology classification.

Contribution

The paper introduces a novel low-rank adaptation approach for efficiently fine-tuning large medical vision-language models with minimal parameters, enabling effective zero-shot clinical task performance.

Findings

01

LoRA fine-tuning improves AUROC by 7.6 percentage points.

02

Model achieves higher accuracy and macro-F1 scores after adaptation.

03

Parameter-efficient adaptation enables effective transfer from large-scale pretraining.

Abstract

Foundation models trained via vision-language pretraining have demonstrated strong zero-shot capabilities across diverse image domains, yet their application to volumetric medical imaging remains limited. We introduce MedCT-VLM: Medical CT Vision-Language Model, a parameter-efficient vision-language framework designed to adapt large-scale CT foundation models for downstream clinical tasks. MedCT-VLM uses a parameter-efficient approach to adapt CT-CLIP, a contrastive vision-language model trained on 25,692 chest CT volumes, for multi-label pathology classification using Low-Rank Adaptation (LoRA). Rather than fine-tuning the model's 440 M parameters directly, we insert low-rank decomposition matrices into attention layers of both vision and text encoders, training only 1.67M parameters (0.38\% of total). We evaluate on zero-shot classification across 18 thoracic pathologies, where the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCOVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning · Artificial Intelligence in Healthcare and Education