LOTUS: Improving Transformer Efficiency with Sparsity Pruning and Data Lottery Tickets
Ojasw Upadhyay

TL;DR
This paper presents LOTUS, a method that combines data lottery ticket selection and sparsity pruning to significantly improve the training efficiency of vision transformers without sacrificing accuracy.
Contribution
Introducing LOTUS, a novel approach that leverages data lottery tickets and sparsity pruning to accelerate vision transformer training and reduce computational costs.
Findings
Achieves rapid convergence with high accuracy
Reduces computational requirements significantly
Effective combination of data selection and sparsity techniques
Abstract
Vision transformers have revolutionized computer vision, but their computational demands present challenges for training and deployment. This paper introduces LOTUS (LOttery Transformers with Ultra Sparsity), a novel method that leverages data lottery ticket selection and sparsity pruning to accelerate vision transformer training while maintaining accuracy. Our approach focuses on identifying and utilizing the most informative data subsets and eliminating redundant model parameters to optimize the training process. Through extensive experiments, we demonstrate the effectiveness of LOTUS in achieving rapid convergence and high accuracy with significantly reduced computational requirements. This work highlights the potential of combining data selection and sparsity techniques for efficient vision transformer training, opening doors for further research and development in this area.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPower Transformer Diagnostics and Insulation · Power Quality and Harmonics · Electricity Theft Detection Techniques
MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Multi-Head Attention · Dense Connections · Residual Connection · Softmax · Vision Transformer · Pruning
