Training Noise Token Pruning

Mingxing Rao; Bohan Jiang; Daniel Moyer

arXiv:2411.18092·cs.CV·March 17, 2025

Training Noise Token Pruning

Mingxing Rao, Bohan Jiang, Daniel Moyer

PDF

Open Access 1 Repo

TL;DR

This paper introduces Training Noise Token (TNT) Pruning, a method for efficiently reducing tokens in vision transformers by using continuous noise during training, leading to better optimization and deployment efficiency.

Contribution

TNT Pruning is a novel approach that relaxes discrete token dropping with continuous noise, improving training and deployment in vision transformers.

Findings

01

TNT outperforms previous pruning methods on ImageNet.

02

Theoretical links to Rate-Distortion theory are established.

03

Empirical results show improved efficiency and accuracy.

Abstract

In the present work we present Training Noise Token (TNT) Pruning for vision transformers. Our method relaxes the discrete token dropping condition to continuous additive noise, providing smooth optimization in training, while retaining discrete dropping computational gains in deployment settings. We provide theoretical connections to Rate-Distortion literature, and empirical evaluations on the ImageNet dataset using ViT and DeiT architectures demonstrating TNT's advantages over previous pruning methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mx-ethan-rao/tnt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic Technology and Sound Studies

MethodsAttention Is All You Need · Dense Connections · Feedforward Network · Linear Layer · Attention Dropout · Softmax · Multi-Head Attention · Dropout · Data-efficient Image Transformer · Pruning