Accelerating Transformers with Spectrum-Preserving Token Merging
Hoai-Chau Tran, Duy M. H. Nguyen, Duy M. Nguyen, Trung-Tin Nguyen,, Ngan Le, Pengtao Xie, Daniel Sonntag, James Y. Zou, Binh T. Nguyen, Mathias, Niepert

TL;DR
This paper introduces PiToMe, a novel token merging method for Transformers that preserves informative tokens using an energy score, significantly reducing computational costs while maintaining high accuracy in vision and language tasks.
Contribution
PiToMe is a new paradigm that prioritizes token preservation through an energy score, outperforming prior merging algorithms in efficiency and accuracy.
Findings
Saved 40-60% FLOPs in models
Achieved only 0.5% performance drop in image classification
Maintained spectral properties of token space
Abstract
Increasing the throughput of the Transformer architecture, a foundational component used in numerous state-of-the-art models for vision and language tasks (e.g., GPT, LLaVa), is an important problem in machine learning. One recent and effective strategy is to merge token representations within Transformer models, aiming to reduce computational and memory requirements while maintaining accuracy. Prior works have proposed algorithms based on Bipartite Soft Matching (BSM), which divides tokens into distinct sets and merges the top k similar tokens. However, these methods have significant drawbacks, such as sensitivity to token-splitting strategies and damage to informative tokens in later layers. This paper presents a novel paradigm called PiToMe, which prioritizes the preservation of informative tokens using an additional metric termed the energy score. This score identifies large…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAlgorithms and Data Compression · Retinal Imaging and Analysis · Distributed systems and fault tolerance
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Cosine Annealing · Label Smoothing · Absolute Position Encodings · Multi-Head Attention · Softmax · Linear Warmup With Cosine Annealing · Adam
