Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning
Amin Karimi Monsefi, Mengxi Zhou, Nastaran Karimi Monsefi, Ser-Nam, Lim, Wei-Lun Chao, Rajiv Ramnath

TL;DR
This paper introduces FOLK, a frequency-guided SSL method that adaptively masks image frequencies and uses knowledge distillation, leading to improved pre-training efficiency and performance across multiple vision tasks.
Contribution
The paper proposes FOLK, a novel frequency-based SSL approach that adaptively selects masked frequencies and employs a two-branch framework with knowledge distillation, addressing limitations of prior fixed-frequency masking methods.
Findings
FOLK achieves competitive results on image classification tasks.
FOLK improves performance in few-shot learning scenarios.
FOLK enhances semantic segmentation accuracy.
Abstract
We present a novel frequency-based Self-Supervised Learning (SSL) approach that significantly enhances its efficacy for pre-training. Prior work in this direction masks out pre-defined frequencies in the input image and employs a reconstruction loss to pre-train the model. While achieving promising results, such an implementation has two fundamental limitations as identified in our paper. First, using pre-defined frequencies overlooks the variability of image frequency responses. Second, pre-trained with frequency-filtered images, the resulting model needs relatively more data to adapt to naturally looking images during fine-tuning. To address these drawbacks, we propose FOurier transform compression with seLf-Knowledge distillation (FOLK), integrating two dedicated ideas. First, inspired by image compression, we adaptively select the masked-out frequencies based on image frequency…
Peer Reviews
Decision·ICLR 2025 Poster
**Originality**. The paper investigated two fundamental limitations in the MFM work and proposed two novel designs to address these limitations. The presentation clearly shows what are the novel elements. **Quality**. The paper shows a successful way to perform masking in the frequency domain for unlabeled training images. Additionally, the authors provided a proper self-knowledge distillation framework to deal with the negative effect of training with frequency-masked images. **Clarity**.
**Training Cost**. Given that the proposed method employs a two-branch framework for model training, will it bring additional training costs compared with the original MFM? **Masking Filters**. What are the exact formulations of Com and RCom masking? or pseudo code to construct Com and RCom might be helpful. **Data Augmentations**. In generating two views, u and v, distinct transformations (random cropping, color jittering, etc.) are conducted. It seems no ablation studies are provided for an
1. The framework is applicable and straightforward to understand. 2. The proposed method improves the learning of the student model and facilitates a more efficient training process. 3. The paper presents experiments across multiple datasets and various vision tasks, demonstrating the effectiveness of the proposed method.
1. The dual-stream and frequency-domain masking approaches applied in the article are relatively common schemes. Could the authors elaborate further on the motivation of the proposed method? 2. More analysis and experiments are required on the framework design and cost computation, please see the questions.
The paper presents a new method that combines frequency-based masking with self-knowledge distillation, addressing known limitations in the field of SSL for computer vision tasks. The paper provides extensive experimental results that demonstrate FOLK's effectiveness across a range of tasks and benchmarks, showing improvements over existing state-of-the-art methods.
The author proposed two limitations in the introduction, but the experiments did not directly discuss how to address these limitations. Simply showing performance improvements (e.g., image classification tasks) is not enough to support the author's claims.
Code & Models
Videos
Taxonomy
TopicsImage Processing Techniques and Applications
