Accelerating Transformers with Spectrum-Preserving Token Merging

Hoai-Chau Tran; Duy M. H. Nguyen; Duy M. Nguyen; Trung-Tin Nguyen,; Ngan Le; Pengtao Xie; Daniel Sonntag; James Y. Zou; Binh T. Nguyen; Mathias; Niepert

arXiv:2405.16148·cs.LG·October 31, 2024·1 cites

Accelerating Transformers with Spectrum-Preserving Token Merging

Hoai-Chau Tran, Duy M. H. Nguyen, Duy M. Nguyen, Trung-Tin Nguyen,, Ngan Le, Pengtao Xie, Daniel Sonntag, James Y. Zou, Binh T. Nguyen, Mathias, Niepert

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces PiToMe, a novel token merging method for Transformers that preserves informative tokens using an energy score, significantly reducing computational costs while maintaining high accuracy in vision and language tasks.

Contribution

PiToMe is a new paradigm that prioritizes token preservation through an energy score, outperforming prior merging algorithms in efficiency and accuracy.

Findings

01

Saved 40-60% FLOPs in models

02

Achieved only 0.5% performance drop in image classification

03

Maintained spectral properties of token space

Abstract

Increasing the throughput of the Transformer architecture, a foundational component used in numerous state-of-the-art models for vision and language tasks (e.g., GPT, LLaVa), is an important problem in machine learning. One recent and effective strategy is to merge token representations within Transformer models, aiming to reduce computational and memory requirements while maintaining accuracy. Prior works have proposed algorithms based on Bipartite Soft Matching (BSM), which divides tokens into distinct sets and merges the top k similar tokens. However, these methods have significant drawbacks, such as sensitivity to token-splitting strategies and damage to informative tokens in later layers. This paper presents a novel paradigm called PiToMe, which prioritizes the preservation of informative tokens using an additional metric termed the energy score. This score identifies large…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hchautran/PiToMe
pytorchOfficial

Videos

Accelerating Transformers with Spectrum-Preserving Token Merging· slideslive

Taxonomy

TopicsAlgorithms and Data Compression · Retinal Imaging and Analysis · Distributed systems and fault tolerance

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Cosine Annealing · Label Smoothing · Absolute Position Encodings · Multi-Head Attention · Softmax · Linear Warmup With Cosine Annealing · Adam