Shrinking the Giant : Quasi-Weightless Transformers for Low Energy Inference
Shashank Nag, Alan T. L. Bacellar, Zachary Susskind, Anshul Jha, Logan, Liberty, Aishwarya Sivakumar, Eugene B. John, Krishnan Kailas, Priscila M. V., Lima, Neeraja J. Yadwadkar, Felipe M. G. Franca, Lizy K. John

TL;DR
This paper introduces Quasi Weightless Transformers (QuWeiT), a novel approach that replaces parts of transformer models with lookup table-based neural networks to significantly reduce energy consumption while maintaining high accuracy.
Contribution
The paper extends LUT-based neural networks to replace MLP layers in transformers, enabling low-energy, high-efficiency inference without substantial accuracy loss.
Findings
Achieved 95.64% accuracy on CIFAR-10 with 55% fewer multiplications.
Realized 2.2x energy efficiency improvements.
Demonstrated similar savings in nanoGPT experiments.
Abstract
Transformers are set to become ubiquitous with applications ranging from chatbots and educational assistants to visual recognition and remote sensing. However, their increasing computational and memory demands is resulting in growing energy consumption. Building models with fast and energy-efficient inference is imperative to enable a variety of transformer-based applications. Look Up Table (LUT) based Weightless Neural Networks are faster than the conventional neural networks as their inference only involves a few lookup operations. Recently, an approach for learning LUT networks directly via an Extended Finite Difference method was proposed. We build on this idea, extending it for performing the functions of the Multi Layer Perceptron (MLP) layers in transformer models and integrating them with transformers to propose Quasi Weightless Transformers (QuWeiT). This allows for a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies
MethodsSparse Evolutionary Training
