Hi-End-MAE: Hierarchical encoder-driven masked autoencoders are stronger   vision learners for medical image segmentation

Fenghe Tang; Qingsong Yao; Wenxin Ma; Chenxu Wu; Zihang Jiang; S.; Kevin Zhou

arXiv:2502.08347·cs.CV·February 13, 2025

Hi-End-MAE: Hierarchical encoder-driven masked autoencoders are stronger vision learners for medical image segmentation

Fenghe Tang, Qingsong Yao, Wenxin Ma, Chenxu Wu, Zihang Jiang, S., Kevin Zhou

PDF

Open Access 1 Repo

TL;DR

Hi-End-MAE introduces a hierarchical encoder-driven masked autoencoder framework for medical image segmentation, leveraging multi-layer representations and encoder-guided reconstruction to improve pre-training effectiveness on large-scale unlabeled datasets.

Contribution

It proposes a novel ViT-based pre-training method with encoder-driven reconstruction and hierarchical decoding, enhancing feature learning for medical image segmentation.

Findings

01

Outperforms existing methods on seven benchmarks.

02

Achieves superior transfer learning capabilities.

03

Effective on large-scale unlabeled medical datasets.

Abstract

Medical image segmentation remains a formidable challenge due to the label scarcity. Pre-training Vision Transformer (ViT) through masked image modeling (MIM) on large-scale unlabeled medical datasets presents a promising solution, providing both computational efficiency and model generalization for various downstream tasks. However, current ViT-based MIM pre-training frameworks predominantly emphasize local aggregation representations in output layers and fail to exploit the rich representations across different ViT layers that better capture fine-grained semantic information needed for more precise medical downstream tasks. To fill the above gap, we hereby present Hierarchical Encoder-driven MAE (Hi-End-MAE), a simple yet effective ViT-based pre-training solution, which centers on two key innovations: (1) Encoder-driven reconstruction, which encourages the encoder to learn more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fenghetan9/hi-end-mae
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in cancer detection · Radiomics and Machine Learning in Medical Imaging · Medical Image Segmentation Techniques

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer · Masked autoencoder · Adam · Softmax · Absolute Position Encodings · Mutual Information Machine/Mask Image Modeling · Dropout