TL;DR
ESICA is a scalable, efficient framework for text-guided 3D medical image segmentation that improves semantic alignment, boundary accuracy, and generalization across multiple imaging modalities.
Contribution
The paper introduces ESICA, a lightweight framework with novel similarity matrix-based mask prediction, an efficient decoder, and a two-pass refinement strategy for improved segmentation.
Findings
Achieves state-of-the-art accuracy on the CVPR BiomedSegFM benchmark.
The ESICA4 Lite variant maintains high performance with fewer parameters.
Demonstrates effective segmentation across five diverse imaging modalities.
Abstract
Text guided 3D medical image segmentation offers a flexible alternative to class based and spatial prompt based models by allowing users to specify regions of interest directly in natural language. This paradigm avoids reliance on predefined label sets, reduces ambiguous outputs, and aligns more naturally with clinical workflows. However, existing text guided frameworks are often computationally expensive, exhibit weak text volume feature alignment, and fail to capture fine anatomical details. We propose ESICA, a lightweight and scalable framework that addresses these challenges through three innovations: (1) a similarity matrix based mask prediction formulation that enhances semantic alignment, (2) an efficient decomposed decoder with adapter modules for accurate volumetric decoding, and (3) a two pass refinement strategy that sharpens boundaries and resolves uncertain regions. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
