Training Noise Token Pruning
Mingxing Rao, Bohan Jiang, Daniel Moyer

TL;DR
This paper introduces Training Noise Token (TNT) Pruning, a method for efficiently reducing tokens in vision transformers by using continuous noise during training, leading to better optimization and deployment efficiency.
Contribution
TNT Pruning is a novel approach that relaxes discrete token dropping with continuous noise, improving training and deployment in vision transformers.
Findings
TNT outperforms previous pruning methods on ImageNet.
Theoretical links to Rate-Distortion theory are established.
Empirical results show improved efficiency and accuracy.
Abstract
In the present work we present Training Noise Token (TNT) Pruning for vision transformers. Our method relaxes the discrete token dropping condition to continuous additive noise, providing smooth optimization in training, while retaining discrete dropping computational gains in deployment settings. We provide theoretical connections to Rate-Distortion literature, and empirical evaluations on the ImageNet dataset using ViT and DeiT architectures demonstrating TNT's advantages over previous pruning methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies
MethodsAttention Is All You Need · Dense Connections · Feedforward Network · Linear Layer · Attention Dropout · Softmax · Multi-Head Attention · Dropout · Data-efficient Image Transformer · Pruning
