ESICA: A Scalable Framework for Text-Guided 3D Medical Image Segmentation

Yu Xin; Gorkem Can Ates; Jun Ma; Sumin Kim; Ying Zhang; Kaleb E Smith; Kuang Gong; Wei Shao

arXiv:2604.24876·cs.CV·April 29, 2026

ESICA: A Scalable Framework for Text-Guided 3D Medical Image Segmentation

Yu Xin, Gorkem Can Ates, Jun Ma, Sumin Kim, Ying Zhang, Kaleb E Smith, Kuang Gong, Wei Shao

PDF

1 Repo

TL;DR

ESICA is a scalable, efficient framework for text-guided 3D medical image segmentation that improves semantic alignment, boundary accuracy, and generalization across multiple imaging modalities.

Contribution

The paper introduces ESICA, a lightweight framework with novel similarity matrix-based mask prediction, an efficient decoder, and a two-pass refinement strategy for improved segmentation.

Findings

01

Achieves state-of-the-art accuracy on the CVPR BiomedSegFM benchmark.

02

The ESICA4 Lite variant maintains high performance with fewer parameters.

03

Demonstrates effective segmentation across five diverse imaging modalities.

Abstract

Text guided 3D medical image segmentation offers a flexible alternative to class based and spatial prompt based models by allowing users to specify regions of interest directly in natural language. This paradigm avoids reliance on predefined label sets, reduces ambiguous outputs, and aligns more naturally with clinical workflows. However, existing text guided frameworks are often computationally expensive, exhibit weak text volume feature alignment, and fail to capture fine anatomical details. We propose ESICA, a lightweight and scalable framework that addresses these challenges through three innovations: (1) a similarity matrix based mask prediction formulation that enhances semantic alignment, (2) an efficient decomposed decoder with adapter modules for accurate volumetric decoding, and (3) a two pass refinement strategy that sharpens boundaries and resolves uncertain regions. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mirthAI/ESICA
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.