Improving Representation of High-frequency Components for Medical Visual Foundation Models
Yuetan Chu, Yilan Zhang, Zhongyi Han, Changchun Yang, Longxi Zhou,, Gongning Luo, Chao Huang, Xin Gao

TL;DR
This paper introduces Frepa, a novel pretraining strategy that enhances high-frequency component representation in medical visual models, significantly improving performance on detailed medical imaging tasks.
Contribution
The paper proposes Frepa, a new pretraining method combining high-frequency masking, low-frequency perturbation, and adversarial learning, extending to various architectures and modalities.
Findings
Frepa outperforms existing self-supervised methods without fine-tuning.
Achieves up to +15% DSC in retina vessel segmentation.
Enables better high-frequency feature preservation in embeddings.
Abstract
Foundation models have recently attracted significant attention for their impressive generalizability across diverse downstream tasks. However, these models are demonstrated to exhibit great limitations in representing high-frequency components and fine-grained details. In many medical imaging tasks, the precise representation of such information is crucial due to the inherently intricate anatomical structures, sub-visual features, and complex boundaries involved. Consequently, the limited representation of prevalent foundation models can result in significant performance degradation or even failure in these tasks. To address these challenges, we propose a novel pretraining strategy, named Frequency-advanced Representation Autoencoder (Frepa). Through high-frequency masking and low-frequency perturbation combined with adversarial learning, Frepa encourages the encoder to effectively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Imaging Techniques and Applications · Advanced Data Processing Techniques · Engineering Technology and Methodologies
MethodsAttention Is All You Need · Stochastic Depth · Swin Transformer · Byte Pair Encoding · Layer Normalization · Label Smoothing · Linear Layer · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings
