Unifying Segment Anything in Microscopy with Vision-Language Knowledge
Manyu Li, Ruian He, Zixian Zhang, Chenxi Ma, Weimin Tan, Bo Yan

TL;DR
This paper introduces uLLSAM, a framework that enhances biomedical image segmentation by integrating vision-language knowledge into the Segment Anything Model, significantly improving performance and generalization across microscopy datasets.
Contribution
The paper presents a novel approach to incorporate multimodal large language models into segmentation models, unifying microscopy segmentation with vision-language understanding.
Findings
11.8% improvement in segmentation accuracy on in-domain datasets
9.2% improvement in out-of-domain dataset performance
State-of-the-art results on multiple microscopy datasets
Abstract
Accurate segmentation of regions of interest in biomedical images holds substantial value in image analysis. Although several foundation models for biomedical segmentation have currently achieved excellent performance on certain datasets, they typically demonstrate sub-optimal performance on unseen domain data. We owe the deficiency to lack of vision-language knowledge before segmentation. Multimodal Large Language Models (MLLMs) bring outstanding understanding and reasoning capabilities to multimodal tasks, which inspires us to leverage MLLMs to inject Vision-Language Knowledge (VLK), thereby enabling vision models to demonstrate superior generalization capabilities on cross-domain datasets. In this paper, we propose a novel framework that seamlessly uses MLLMs to guide SAM in learning microscopy cross-domain data, unifying Segment Anything in Microscopy, named uLLSAM. Specifically, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
MethodsSegment Anything Model
