Self-Supervised Pretraining Improves Self-Supervised Pretraining
Colorado J. Reed, Xiangyu Yue, Ani Nrusimha, Sayna Ebrahimi, and Vivek Vijaykumar, Richard Mao, Bo Li, Shanghang Zhang, Devin, Guillory, Sean Metzger, Kurt Keutzer, Trevor Darrell

TL;DR
This paper introduces Hierarchical PreTraining (HPT), a method that accelerates self-supervised pretraining and enhances accuracy and robustness across diverse vision tasks by initializing from existing models.
Contribution
HPT is a novel framework that reduces pretraining time and improves performance by leveraging existing pretrained models for initialization.
Findings
HPT accelerates convergence up to 80x faster.
HPT improves accuracy across 16 vision datasets.
HPT increases robustness to data augmentation and data amount variations.
Abstract
While self-supervised pretraining has proven beneficial for many computer vision tasks, it requires expensive and lengthy computation, large amounts of data, and is sensitive to data augmentation. Prior work demonstrates that models pretrained on datasets dissimilar to their target data, such as chest X-ray models trained on ImageNet, underperform models trained from scratch. Users that lack the resources to pretrain must use existing models with lower performance. This paper explores Hierarchical PreTraining (HPT), which decreases convergence time and improves accuracy by initializing the pretraining process with an existing pretrained model. Through experimentation on 16 diverse vision datasets, we show HPT converges up to 80x faster, improves accuracy across tasks, and improves the robustness of the self-supervised pretraining process to changes in the image augmentation policy or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Self-Supervised Pretraining Improves Self-Supervised Pretraining· youtube
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition
