Accelerating Augmentation Invariance Pretraining

Jinhong Lin; Cheng-En Wu; Yibing Wei; Pedro Morgado

arXiv:2410.22364·cs.CV·November 1, 2024

Accelerating Augmentation Invariance Pretraining

Jinhong Lin, Cheng-En Wu, Yibing Wei, Pedro Morgado

PDF

Open Access 1 Video

TL;DR

This paper introduces an acceleration framework for contrastive learning of Vision Transformers, using sequence compression techniques to significantly reduce training time while maintaining performance.

Contribution

It proposes a novel sequence compression-based acceleration method and an optimal schedule for training ViTs efficiently without performance loss.

Findings

01

Achieves 4x speedup in MoCo on ImageNet

02

Reduces training time by over 2.5x in DINO

03

Provides analysis of gradient error and performance trade-offs

Abstract

Our work tackles the computational challenges of contrastive learning methods, particularly for the pretraining of Vision Transformers (ViTs). Despite the effectiveness of contrastive learning, the substantial computational resources required for training often hinder their practical application. To mitigate this issue, we propose an acceleration framework, leveraging ViT's unique ability to generalize across inputs of varying sequence lengths. Our method employs a mix of sequence compression strategies, including randomized token dropout and flexible patch scaling, to reduce the cost of gradient estimation and accelerate convergence. We further provide an in-depth analysis of the gradient estimation error of various acceleration strategies as well as their impact on downstream tasks, offering valuable insights into the trade-offs between acceleration and performance. We also propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Accelerating Augmentation Invariance Pretraining· slideslive

Taxonomy

TopicsAdvanced Data Compression Techniques · Speech Recognition and Synthesis · Neural Networks and Applications

MethodsBitcoin Customer Service Number +1-833-534-1729 · Average Pooling · Max Pooling · Global Average Pooling · Convolution · Kaiming Initialization · Color Jitter · Softmax · Linear Layer · Random Resized Crop