Hi-End-MAE: Hierarchical encoder-driven masked autoencoders are stronger vision learners for medical image segmentation
Fenghe Tang, Qingsong Yao, Wenxin Ma, Chenxu Wu, Zihang Jiang, S., Kevin Zhou

TL;DR
Hi-End-MAE introduces a hierarchical encoder-driven masked autoencoder framework for medical image segmentation, leveraging multi-layer representations and encoder-guided reconstruction to improve pre-training effectiveness on large-scale unlabeled datasets.
Contribution
It proposes a novel ViT-based pre-training method with encoder-driven reconstruction and hierarchical decoding, enhancing feature learning for medical image segmentation.
Findings
Outperforms existing methods on seven benchmarks.
Achieves superior transfer learning capabilities.
Effective on large-scale unlabeled medical datasets.
Abstract
Medical image segmentation remains a formidable challenge due to the label scarcity. Pre-training Vision Transformer (ViT) through masked image modeling (MIM) on large-scale unlabeled medical datasets presents a promising solution, providing both computational efficiency and model generalization for various downstream tasks. However, current ViT-based MIM pre-training frameworks predominantly emphasize local aggregation representations in output layers and fail to exploit the rich representations across different ViT layers that better capture fine-grained semantic information needed for more precise medical downstream tasks. To fill the above gap, we hereby present Hierarchical Encoder-driven MAE (Hi-End-MAE), a simple yet effective ViT-based pre-training solution, which centers on two key innovations: (1) Encoder-driven reconstruction, which encourages the encoder to learn more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in cancer detection · Radiomics and Machine Learning in Medical Imaging · Medical Image Segmentation Techniques
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer · Masked autoencoder · Adam · Softmax · Absolute Position Encodings · Mutual Information Machine/Mask Image Modeling · Dropout
