Evolving Layer-Specific Scalar Functions for Hardware-Aware Transformer Adaptation
Kieran Carrigg, Sigur de Vries, Amirhossein Sadough, Marcel van Gerven

TL;DR
This paper introduces a hardware-aware, evolutionary approach to adapt Vision Transformers for edge devices by evolving layer-specific scalar functions, significantly reducing complexity while maintaining high accuracy.
Contribution
It presents a novel genetic programming framework to generate layer-specific scalar functions, eliminating the need for retraining and improving hardware efficiency.
Findings
Achieves 91.6% variance approximation of normalization behaviors.
Recovers 84.25% Top-1 accuracy on ImageNet-1K in 20 epochs.
Reduces global reduction bottleneck, enabling efficient edge deployment.
Abstract
Vision Transformers (ViTs) achieve state-of-the-art performance on challenging vision tasks, but their deployment on edge devices is severely hindered by the computational complexity and global reduction bottleneck imposed by layer normalization. Recent methods attempt to bypass this by replacing normalization layers with hardware-friendly scalar approximations. However, these homogeneous replacements do not optimally fit to all layers' behaviour and rely on expensive model retraining. In this work, we propose a highly efficient, hardware-aware framework that utilizes genetic programming (GP) to evolve heterogeneous, layer-specific scalar functions directly from pre-trained weights. Coupled with a novel post-training re-alignment strategy, our approach eliminates the need to retrain models from scratch entirely. Our evolved expressions accurately approximate the target normalization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
