Pre-training on High Definition X-ray Images: An Experimental Study

Xiao Wang; Yuehang Li; Wentao Wu; Jiandong Jin; Yao Rong; Bo Jiang,; Chuanfu Li; Jin Tang

arXiv:2404.17926·eess.IV·April 30, 2024

Pre-training on High Definition X-ray Images: An Experimental Study

Xiao Wang, Yuehang Li, Wentao Wu, Jiandong Jin, Yao Rong, Bo Jiang,, Chuanfu Li, Jin Tang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a high-resolution, large-scale X-ray pre-trained vision model using a novel masking strategy, significantly improving performance on report generation and disease recognition tasks.

Contribution

It presents the first high-definition X-ray pre-trained model on over 1 million images with a novel context-aware masking strategy for better downstream task performance.

Findings

01

Achieves state-of-the-art results on benchmark datasets.

02

Demonstrates effectiveness of high-resolution pre-training.

03

Validates the proposed masking strategy improves model learning.

Abstract

Existing X-ray based pre-trained vision models are usually conducted on a relatively small-scale dataset (less than 500k samples) with limited resolution (e.g., 224 $\times$ 224). However, the key to the success of self-supervised pre-training large models lies in massive training data, and maintaining high resolution in the field of X-ray images is the guarantee of effective solutions to difficult miscellaneous diseases. In this paper, we address these issues by proposing the first high-definition (1280 $\times$ 1280) X-ray based pre-trained foundation vision model on our newly collected large-scale dataset which contains more than 1 million X-ray images. Our model follows the masked auto-encoder framework which takes the tokens after mask processing (with a high rate) is used as input, and the masked image patches are reconstructed by the Transformer encoder-decoder network. More…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

event-ahu/medical_image_analysis
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRadiomics and Machine Learning in Medical Imaging · Medical Imaging Techniques and Applications

MethodsAttention Is All You Need · L1 Regularization · Dropout · Residual Connection · Softmax · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · Linear Layer · Dense Connections