MAIS: Memory-Attention for Interactive Segmentation

Mauricio Orbes-Arteaga; Oeslle Lucena; Sabastien Ourselin; M. Jorge Cardoso

arXiv:2505.07511·cs.CV·May 13, 2025

MAIS: Memory-Attention for Interactive Segmentation

Mauricio Orbes-Arteaga, Oeslle Lucena, Sabastien Ourselin, M. Jorge Cardoso

PDF

Open Access

TL;DR

MAIS introduces a Memory-Attention mechanism that leverages past user inputs and segmentation states to improve the efficiency and accuracy of interactive medical image segmentation, especially when using ViT-based models.

Contribution

The paper proposes a novel Memory-Attention mechanism for interactive segmentation that effectively integrates temporal context from previous interactions, enhancing ViT-based models.

Findings

01

Improved segmentation accuracy across multiple imaging modalities.

02

Reduced number of user interactions needed for accurate segmentation.

03

Enhanced efficiency in interactive medical segmentation tasks.

Abstract

Interactive medical segmentation reduces annotation effort by refining predictions through user feedback. Vision Transformer (ViT)-based models, such as the Segment Anything Model (SAM), achieve state-of-the-art performance using user clicks and prior masks as prompts. However, existing methods treat interactions as independent events, leading to redundant corrections and limited refinement gains. We address this by introducing MAIS, a Memory-Attention mechanism for Interactive Segmentation that stores past user inputs and segmentation states, enabling temporal context integration. Our approach enhances ViT-based segmentation across diverse imaging modalities, achieving more efficient and accurate refinements.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Dense Connections · Vision Transformer · Dropout · Layer Normalization · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Softmax