TOAST: Transformer Optimization using Adaptive and Simple Transformations

Irene Cannistraci; Simone Antonelli; Emanuele Palumbo; Thomas M. Sutter; Emanuele Rodol\`a; Bastian Rieck; Julia E. Vogt

arXiv:2410.04941·cs.LG·May 19, 2026

TOAST: Transformer Optimization using Adaptive and Simple Transformations

Irene Cannistraci, Simone Antonelli, Emanuele Palumbo, Thomas M. Sutter, Emanuele Rodol\`a, Bastian Rieck, Julia E. Vogt

PDF

TL;DR

TOAST introduces a novel method to optimize transformers by replacing redundant internal components with simple, train-free transformations, significantly reducing parameters and computation without sacrificing performance.

Contribution

It exploits intra-network redundancies to approximate transformer blocks with lightweight mappings, enabling efficient model compression without retraining.

Findings

01

Reduces parameters and computation in vision transformers

02

Preserves or improves downstream performance

03

Applicable across multiple pretrained models and datasets

Abstract

Foundation models achieve state-of-the-art performance across different tasks, but their size and computational demands raise concerns about accessibility and sustainability. Existing efficiency methods often require additional retraining or finetuning, limiting their practicality. Recent findings suggest that deep neural networks exhibit internal representation similarities. While such similarities across different models have been exploited for enabling techniques such as model stitching and merging, intra-network redundancy remains underexplored as a source for efficiency gains. In this paper, we introduce Transformer Optimization using Adaptive and Simple Transformations (TOAST), a framework that exploits these redundancies to approximate entire transformer blocks with lightweight closed-form mappings, such as linear transformations or even the identity function, without any…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications