Accelerating Volumetric Medical Image Annotation via Short-Long Memory SAM 2

Yuwen Chen; Zafer Yildiz; Qihang Li; Yaqian Chen; Haoyu Dong; Hanxue Gu; Nicholas Konz; Maciej A. Mazurowski

arXiv:2505.01854·eess.IV·November 4, 2025

Accelerating Volumetric Medical Image Annotation via Short-Long Memory SAM 2

Yuwen Chen, Zafer Yildiz, Qihang Li, Yaqian Chen, Haoyu Dong, Hanxue Gu, Nicholas Konz, Maciej A. Mazurowski

PDF

Open Access 2 Repos

TL;DR

This paper introduces SLM-SAM 2, a novel architecture with short- and long-term memory banks that significantly improves the accuracy and efficiency of annotating volumetric medical images by propagating masks more reliably across slices.

Contribution

The paper proposes SLM-SAM 2, an innovative model that enhances medical image annotation by integrating separate memory modules, addressing error propagation issues in existing foundation models.

Findings

01

Outperforms SAM 2 with 0.14 and 0.10 Dice improvements.

02

Reduces mask correction time by approximately 60.6%.

03

Demonstrates robustness across MRI, CT, and ultrasound datasets.

Abstract

Manual annotation of volumetric medical images, such as magnetic resonance imaging (MRI) and computed tomography (CT), is a labor-intensive and time-consuming process. Recent advancements in foundation models for video object segmentation, such as Segment Anything Model 2 (SAM 2), offer a potential opportunity to significantly speed up the annotation process by manually annotating one or a few slices and then propagating target masks across the entire volume. However, the performance of SAM 2 in this context varies. Our experiments show that relying on a single memory bank and attention module is prone to error propagation, particularly at boundary regions where the target is present in the previous slice but absent in the current one. To address this problem, we propose Short-Long Memory SAM 2 (SLM-SAM 2), a novel architecture that integrates distinct short-term and long-term memory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · AI in cancer detection · Medical Image Segmentation Techniques

MethodsSoftmax · Attention Is All You Need · Segment Anything Model · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings