Autoregressive Medical Image Segmentation via Next-Scale Mask Prediction

Tao Chen; Chenhui Wang; Zhihao Chen; Hongming Shan

arXiv:2502.20784·eess.IV·March 3, 2025

Autoregressive Medical Image Segmentation via Next-Scale Mask Prediction

Tao Chen, Chenhui Wang, Zhihao Chen, Hongming Shan

PDF

TL;DR

This paper introduces AR-Seg, a novel autoregressive framework for medical image segmentation that models inter-scale dependencies to improve accuracy and robustness, outperforming existing methods on benchmark datasets.

Contribution

AR-Seg is the first to explicitly model inter-scale dependencies using an autoregressive approach with multi-scale mask autoencoding and consensus aggregation.

Findings

01

AR-Seg outperforms state-of-the-art methods on benchmark datasets.

02

The method effectively visualizes the coarse-to-fine segmentation process.

03

AR-Seg improves robustness and accuracy in complex anatomical regions.

Abstract

While deep learning has significantly advanced medical image segmentation, most existing methods still struggle with handling complex anatomical regions. Cascaded or deep supervision-based approaches attempt to address this challenge through multi-scale feature learning but fail to establish sufficient inter-scale dependencies, as each scale relies solely on the features of the immediate predecessor. To this end, we propose the AutoRegressive Segmentation framework via next-scale mask prediction, termed AR-Seg, which progressively predicts the next-scale mask by explicitly modeling dependencies across all previous scales within a unified architecture. AR-Seg introduces three innovations: (1) a multi-scale mask autoencoder that quantizes the mask into multi-scale token maps to capture hierarchical anatomical structures, (2) a next-scale autoregressive mechanism that progressively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.