Visual Instruction-Finetuned Language Model for Versatile Brain MR Image Tasks
Jonghun Kim, Sinyoung Ra, Hyunjin Park

TL;DR
This paper introduces LLaBIT, a versatile brain MRI task model that integrates visual reasoning with language understanding, outperforming specialized models across multiple clinical tasks.
Contribution
The paper presents LLaBIT, a novel multi-task brain MRI model that effectively combines visual and language reasoning, addressing diverse clinical tasks within a single framework.
Findings
LLaBIT outperforms task-specific models in all evaluated brain MRI tasks.
Incorporating feature map reuse minimizes spatial information loss.
Text data augmentation enhances model performance with limited image-text pairs.
Abstract
LLMs have demonstrated remarkable capabilities in linguistic reasoning and are increasingly adept at vision-language tasks. The integration of image tokens into transformers has enabled direct visual input and output, advancing research from image-to-text descriptions to text-to-image generation. However, simple text-to-image generation holds limited clinical utility. In medical imaging, tasks such as image segmentation for localizing pathologies or image translation for reconstructing missing sequences have much greater clinical importance. Despite this, integrating these diverse, clinically relevant tasks within a single, versatile language model remains unexplored. Our method, LLaBIT (Large Language Model for Brain Image Translation), extends the visual reasoning of LLMs to these clinically meaningful tasks in the brain MRI domain. To mitigate the spatial information loss inherent in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
