FastDINOv2: Frequency Based Curriculum Learning Improves Robustness and Training Speed
Jiaqi Zhang, Juntuo Wang, Zhixin Sun, John Zou, Randall Balestriero

TL;DR
FastDINOv2 introduces a frequency filtering curriculum and noise augmentation that significantly accelerates training and enhances robustness of vision models, making large-scale self-supervised learning more efficient and resilient.
Contribution
It proposes a novel frequency-based curriculum learning method combined with noise augmentation to improve training speed and robustness in vision models.
Findings
Training time reduced by 1.6x and FLOPs by 2.25x.
Achieves comparable robustness to baseline models.
Maintains competitive linear probing performance.
Abstract
Large-scale vision foundation models such as DINOv2 boast impressive performances by leveraging massive architectures and training datasets. But numerous scenarios require practitioners to reproduce those pre-training solutions, such as on private data, new modalities, or simply for scientific questioning--which is currently extremely demanding computation-wise. We thus propose a novel pre-training strategy for DINOv2 that simultaneously accelerates convergence--and strengthens robustness to common corruptions as a by-product. Our approach involves a frequency filtering curriculum--low-frequency being seen first--and the Gaussian noise patching augmentation. Applied to a ViT-B/16 backbone trained on ImageNet-1K, while pre-training time and FLOPs are reduced by 1.6x and 2.25x, our method still achieves matching robustness in corruption benchmarks (ImageNet-C) and maintains competitive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsExperimental Learning in Engineering
