LoGSAM: Parameter-Efficient Cross-Modal Grounding for MRI Segmentation

Mohammad Robaitul Islam Bhuiyan; Sheethal Bhat; Melika Qahqaie; Tri-Thien Nguyen; Paula Andrea Perez-Toro; Tomas Arias-Vergara; Andreas Maier

arXiv:2603.17576·cs.CV·March 30, 2026

LoGSAM: Parameter-Efficient Cross-Modal Grounding for MRI Segmentation

Mohammad Robaitul Islam Bhuiyan, Sheethal Bhat, Melika Qahqaie, Tri-Thien Nguyen, Paula Andrea Perez-Toro, Tomas Arias-Vergara, Andreas Maier

PDF

TL;DR

LoGSAM introduces a parameter-efficient, speech-guided MRI tumor segmentation framework that leverages pretrained foundation models with minimal updates, achieving state-of-the-art accuracy and high case-level reliability.

Contribution

It presents a novel modular pipeline combining speech transcription, NLP, vision-language detection, and segmentation models with minimal parameter updates for MRI tumor segmentation.

Findings

01

Achieved 80.32% dice score on BRISC 2025 dataset.

02

Attained 91.7% case-level accuracy on unseen MRI scans.

03

Enabled efficient domain adaptation with only 5% of model parameters updated.

Abstract

Precise localization and delineation of brain tumors using Magnetic Resonance Imaging (MRI) are essential for planning therapy and guiding surgical decisions. However, most existing approaches rely on task-specific supervised models and are constrained by the limited availability of annotated data. To address this, we propose LoGSAM, a parameter-efficient, detection-driven framework that transforms radiologist dictation into text prompts for foundation-model-based localization and segmentation. Radiologist speech is first transcribed and translated using a pretrained Whisper ASR model, followed by negation-aware clinical NLP to extract tumor-specific textual prompts. These prompts guide text-conditioned tumor localization via a LoRA-adapted vision-language detection model, Grounding DINO (GDINO). The LoRA adaptation updates using 5% of the model parameters, thereby enabling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.