HySparK: Hybrid Sparse Masking for Large Scale Medical Image Pre-Training
Fenghe Tang, Ronghao Xu, Qingsong Yao, Xueming Fu, Quan Quan, Heqin, Zhu, Zaiyi Liu, S. Kevin Zhou

TL;DR
HySparK introduces a hybrid CNN-Transformer pre-training method using sparse masking for large-scale 3D medical images, improving representation learning and transferability in downstream tasks.
Contribution
This work presents a novel hybrid sparse masking pre-training strategy combining CNNs and Transformers for 3D medical images, which was not previously explored.
Findings
Robust transferability to supervised downstream tasks.
Effective dense multi-scale feature reconstruction.
Promising results on large-scale 3D medical datasets.
Abstract
The generative self-supervised learning strategy exhibits remarkable learning representational capabilities. However, there is limited attention to end-to-end pre-training methods based on a hybrid architecture of CNN and Transformer, which can learn strong local and global representations simultaneously. To address this issue, we propose a generative pre-training strategy called Hybrid Sparse masKing (HySparK) based on masked image modeling and apply it to large-scale pre-training on medical images. First, we perform a bottom-up 3D hybrid masking strategy on the encoder to keep consistency masking. Then we utilize sparse convolution for the top CNNs and encode unmasked patches for the bottom vision Transformers. Second, we employ a simple hierarchical decoder with skip-connections to achieve dense multi-scale feature reconstruction. Third, we implement our pre-training method on a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in cancer detection · Medical Image Segmentation Techniques · Advanced Image Processing Techniques
MethodsLinear Layer · Residual Connection · Multi-Head Attention · Attention Is All You Need · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Convolution · Softmax · Absolute Position Encodings
