Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training
Zhenglun Kong, Haoyu Ma, Geng Yuan, Mengshu Sun, Yanyue Xie, Peiyan, Dong, Xin Meng, Xuan Shen, Hao Tang, Minghai Qin, Tianlong Chen, Xiaolong Ma,, Xiaohui Xie, Zhangyang Wang, Yanzhi Wang

TL;DR
This paper introduces Tri-Level E-ViT, a hierarchical data redundancy reduction framework that accelerates Vision Transformer training by exploiting sparsity at multiple levels, often improving accuracy while reducing training time.
Contribution
It presents a novel end-to-end training method that reduces data redundancy across three hierarchical levels, enhancing efficiency without sacrificing accuracy.
Findings
Achieves up to 15.7% training speedup on ViT models.
Maintains or slightly improves Top-1 accuracy during acceleration.
Demonstrates the existence of significant data redundancy in ViT training.
Abstract
Vision transformers (ViTs) have recently obtained success in many applications, but their intensive computation and heavy memory usage at both training and inference time limit their generalization. Previous compression algorithms usually start from the pre-trained dense models and only focus on efficient inference, while time-consuming training is still unavoidable. In contrast, this paper points out that the million-scale training data is redundant, which is the fundamental reason for the tedious training. To address the issue, this paper aims to introduce sparsity into data and proposes an end-to-end efficient training framework from three sparse perspectives, dubbed Tri-Level E-ViT. Specifically, we leverage a hierarchical data redundancy reduction scheme, by exploring the sparsity under three levels: number of training examples in the dataset, number of patches (tokens) in each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Fusion Techniques · CCD and CMOS Imaging Sensors · Image Enhancement Techniques
