Accelerating Augmentation Invariance Pretraining
Jinhong Lin, Cheng-En Wu, Yibing Wei, Pedro Morgado

TL;DR
This paper introduces an acceleration framework for contrastive learning of Vision Transformers, using sequence compression techniques to significantly reduce training time while maintaining performance.
Contribution
It proposes a novel sequence compression-based acceleration method and an optimal schedule for training ViTs efficiently without performance loss.
Findings
Achieves 4x speedup in MoCo on ImageNet
Reduces training time by over 2.5x in DINO
Provides analysis of gradient error and performance trade-offs
Abstract
Our work tackles the computational challenges of contrastive learning methods, particularly for the pretraining of Vision Transformers (ViTs). Despite the effectiveness of contrastive learning, the substantial computational resources required for training often hinder their practical application. To mitigate this issue, we propose an acceleration framework, leveraging ViT's unique ability to generalize across inputs of varying sequence lengths. Our method employs a mix of sequence compression strategies, including randomized token dropout and flexible patch scaling, to reduce the cost of gradient estimation and accelerate convergence. We further provide an in-depth analysis of the gradient estimation error of various acceleration strategies as well as their impact on downstream tasks, offering valuable insights into the trade-offs between acceleration and performance. We also propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Data Compression Techniques · Speech Recognition and Synthesis · Neural Networks and Applications
MethodsBitcoin Customer Service Number +1-833-534-1729 · Average Pooling · Max Pooling · Global Average Pooling · Convolution · Kaiming Initialization · Color Jitter · Softmax · Linear Layer · Random Resized Crop
