Unifying Segment Anything in Microscopy with Vision-Language Knowledge

Manyu Li; Ruian He; Zixian Zhang; Chenxi Ma; Weimin Tan; Bo Yan

arXiv:2505.10769·cs.CV·November 17, 2025

Unifying Segment Anything in Microscopy with Vision-Language Knowledge

Manyu Li, Ruian He, Zixian Zhang, Chenxi Ma, Weimin Tan, Bo Yan

PDF

Open Access 1 Repo

TL;DR

This paper introduces uLLSAM, a framework that enhances biomedical image segmentation by integrating vision-language knowledge into the Segment Anything Model, significantly improving performance and generalization across microscopy datasets.

Contribution

The paper presents a novel approach to incorporate multimodal large language models into segmentation models, unifying microscopy segmentation with vision-language understanding.

Findings

01

11.8% improvement in segmentation accuracy on in-domain datasets

02

9.2% improvement in out-of-domain dataset performance

03

State-of-the-art results on multiple microscopy datasets

Abstract

Accurate segmentation of regions of interest in biomedical images holds substantial value in image analysis. Although several foundation models for biomedical segmentation have currently achieved excellent performance on certain datasets, they typically demonstrate sub-optimal performance on unseen domain data. We owe the deficiency to lack of vision-language knowledge before segmentation. Multimodal Large Language Models (MLLMs) bring outstanding understanding and reasoning capabilities to multimodal tasks, which inspires us to leverage MLLMs to inject Vision-Language Knowledge (VLK), thereby enabling vision models to demonstrate superior generalization capabilities on cross-domain datasets. In this paper, we propose a novel framework that seamlessly uses MLLMs to guide SAM in learning microscopy cross-domain data, unifying Segment Anything in Microscopy, named uLLSAM. Specifically, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ieellee/ullsam
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications

MethodsSegment Anything Model