Attention Mechanism with Energy-Friendly Operations
Yu Wan, Baosong Yang, Dayiheng Liu, Rong Xiao, Derek F. Wong, Haibo, Zhang, Boxing Chen, Lidia S. Chao

TL;DR
This paper introduces an energy-efficient attention mechanism for NLP that replaces multiplications with less energy-consuming operations, maintaining accuracy while significantly reducing energy consumption.
Contribution
It proposes a novel attention model that substitutes multiplications with energy-friendly operations, achieving comparable accuracy with substantial energy savings.
Findings
Achieves 99% energy reduction in alignment calculation
Maintains competitive accuracy on machine translation tasks
Reduces overall energy consumption by 66% during attention
Abstract
Attention mechanism has become the dominant module in natural language processing models. It is computationally intensive and depends on massive power-hungry multiplications. In this paper, we rethink variants of attention mechanism from the energy consumption aspects. After reaching the conclusion that the energy costs of several energy-friendly operations are far less than their multiplication counterparts, we build a novel attention model by replacing multiplications with either selective operations or additions. Empirical results on three machine translation tasks demonstrate that the proposed model, against the vanilla one, achieves competitable accuracy while saving 99\% and 66\% energy during alignment calculation and the whole attention procedure. Code is available at: https://github.com/NLP2CT/E-Att.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Neural Network Applications · Advanced Graph Neural Networks
