NEMESIS: Noise-suppressed Efficient MAE with Enhanced Superpatch Integration Strategy
Kyeonghun Kim, Hyeonseok Jung, Youngung Han, Hyunsu Go, Eunseob Choi, Seongbin Park, Junsu Lim, Jiwon Yang, Sumin Lee, Insung Hwang, Ken Ying-Kai Liao, Nam-Joon Kim

TL;DR
NEMESIS is a memory-efficient, noise-enhanced masked autoencoder framework for 3D CT imaging that captures anatomical details and achieves high accuracy with limited labeled data.
Contribution
It introduces a superpatch-based MAE with dual-masking and cross-scale tokens, improving efficiency and performance in 3D medical image self-supervised learning.
Findings
Achieves 0.9633 AUROC on BTCV benchmark with frozen backbone.
Maintains 0.9075 AUROC with only 10% labeled data.
Reduces computational cost to 31.0 GFLOPs per forward pass.
Abstract
Volumetric CT imaging is essential for clinical diagnosis, yet annotating 3D volumes is expensive and time-consuming, motivating self-supervised learning (SSL) from unlabeled data. However, applying SSL to 3D CT remains challenging due to the high memory cost of full-volume transformers and the anisotropic spatial structure of CT data, which is not well captured by conventional masking strategies. We propose NEMESIS, a masked autoencoder (MAE) framework that operates on local 128x128x128 superpatches, enabling memory-efficient training while preserving anatomical detail. NEMESIS introduces three key components: (i) noise-enhanced reconstruction as a pretext task, (ii) Masked Anatomical Transformer Blocks (MATB) that perform dual-masking through parallel plane-wise and axis-wise token removal, and (iii) NEMESIS Tokens (NT) for cross-scale context aggregation. On the BTCV multi-organ…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
