HySparK: Hybrid Sparse Masking for Large Scale Medical Image   Pre-Training

Fenghe Tang; Ronghao Xu; Qingsong Yao; Xueming Fu; Quan Quan; Heqin; Zhu; Zaiyi Liu; S. Kevin Zhou

arXiv:2408.05815·cs.CV·August 13, 2024

HySparK: Hybrid Sparse Masking for Large Scale Medical Image Pre-Training

Fenghe Tang, Ronghao Xu, Qingsong Yao, Xueming Fu, Quan Quan, Heqin, Zhu, Zaiyi Liu, S. Kevin Zhou

PDF

Open Access 1 Repo

TL;DR

HySparK introduces a hybrid CNN-Transformer pre-training method using sparse masking for large-scale 3D medical images, improving representation learning and transferability in downstream tasks.

Contribution

This work presents a novel hybrid sparse masking pre-training strategy combining CNNs and Transformers for 3D medical images, which was not previously explored.

Findings

01

Robust transferability to supervised downstream tasks.

02

Effective dense multi-scale feature reconstruction.

03

Promising results on large-scale 3D medical datasets.

Abstract

The generative self-supervised learning strategy exhibits remarkable learning representational capabilities. However, there is limited attention to end-to-end pre-training methods based on a hybrid architecture of CNN and Transformer, which can learn strong local and global representations simultaneously. To address this issue, we propose a generative pre-training strategy called Hybrid Sparse masKing (HySparK) based on masked image modeling and apply it to large-scale pre-training on medical images. First, we perform a bottom-up 3D hybrid masking strategy on the encoder to keep consistency masking. Then we utilize sparse convolution for the top CNNs and encode unmasked patches for the bottom vision Transformers. Second, we employ a simple hierarchical decoder with skip-connections to achieve dense multi-scale feature reconstruction. Third, we implement our pre-training method on a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fenghetan9/hyspark
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in cancer detection · Medical Image Segmentation Techniques · Advanced Image Processing Techniques

MethodsLinear Layer · Residual Connection · Multi-Head Attention · Attention Is All You Need · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Convolution · Softmax · Absolute Position Encodings