SEPT: Towards Scalable and Efficient Visual Pre-Training

Yiqi Lin; Huabin Zheng; Huaping Zhong; Jinjing Zhu; Weijia Li; Conghui; He; Lin Wang

arXiv:2212.05473·cs.CV·December 13, 2022

SEPT: Towards Scalable and Efficient Visual Pre-Training

Yiqi Lin, Huabin Zheng, Huaping Zhong, Jinjing Zhu, Weijia Li, Conghui, He, Lin Wang

PDF

Open Access 1 Video

TL;DR

SEPT introduces a scalable, efficient visual pre-training framework that selects similar unlabeled samples for target tasks, reducing data and computational costs while maintaining or improving performance.

Contribution

The paper proposes a novel data selection-based pre-training framework, SEPT, that leverages feature similarity for scalable and efficient visual model pre-training without extra annotations.

Findings

01

Achieves competitive or better performance than ImageNet pre-training.

02

Reduces training data size by an order of magnitude.

03

Demonstrates high scalability and efficiency across various downstream tasks.

Abstract

Recently, the self-supervised pre-training paradigm has shown great potential in leveraging large-scale unlabeled data to improve downstream task performance. However, increasing the scale of unlabeled pre-training data in real-world scenarios requires prohibitive computational costs and faces the challenge of uncurated samples. To address these issues, we build a task-specific self-supervised pre-training framework from a data selection perspective based on a simple hypothesis that pre-training on the unlabeled samples with similar distribution to the target task can bring substantial performance gains. Buttressed by the hypothesis, we propose the first yet novel framework for Scalable and Efficient visual Pre-Training (SEPT) by introducing a retrieval pipeline for data selection. SEPT first leverage a self-supervised pre-trained model to extract the features of the entire unlabeled…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SEPT: Towards Scalable and Efficient Visual Pre-Training· underline

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications