TOAST: Transformer Optimization using Adaptive and Simple Transformations
Irene Cannistraci, Simone Antonelli, Emanuele Palumbo, Thomas M. Sutter, Emanuele Rodol\`a, Bastian Rieck, Julia E. Vogt

TL;DR
TOAST introduces a novel method to optimize transformers by replacing redundant internal components with simple, train-free transformations, significantly reducing parameters and computation without sacrificing performance.
Contribution
It exploits intra-network redundancies to approximate transformer blocks with lightweight mappings, enabling efficient model compression without retraining.
Findings
Reduces parameters and computation in vision transformers
Preserves or improves downstream performance
Applicable across multiple pretrained models and datasets
Abstract
Foundation models achieve state-of-the-art performance across different tasks, but their size and computational demands raise concerns about accessibility and sustainability. Existing efficiency methods often require additional retraining or finetuning, limiting their practicality. Recent findings suggest that deep neural networks exhibit internal representation similarities. While such similarities across different models have been exploited for enabling techniques such as model stitching and merging, intra-network redundancy remains underexplored as a source for efficiency gains. In this paper, we introduce Transformer Optimization using Adaptive and Simple Transformations (TOAST), a framework that exploits these redundancies to approximate entire transformer blocks with lightweight closed-form mappings, such as linear transformations or even the identity function, without any…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
