Attention Mechanism with Energy-Friendly Operations

Yu Wan; Baosong Yang; Dayiheng Liu; Rong Xiao; Derek F. Wong; Haibo; Zhang; Boxing Chen; Lidia S. Chao

arXiv:2204.13353·cs.CL·October 20, 2022

Attention Mechanism with Energy-Friendly Operations

Yu Wan, Baosong Yang, Dayiheng Liu, Rong Xiao, Derek F. Wong, Haibo, Zhang, Boxing Chen, Lidia S. Chao

PDF

Open Access 1 Repo

TL;DR

This paper introduces an energy-efficient attention mechanism for NLP that replaces multiplications with less energy-consuming operations, maintaining accuracy while significantly reducing energy consumption.

Contribution

It proposes a novel attention model that substitutes multiplications with energy-friendly operations, achieving comparable accuracy with substantial energy savings.

Findings

01

Achieves 99% energy reduction in alignment calculation

02

Maintains competitive accuracy on machine translation tasks

03

Reduces overall energy consumption by 66% during attention

Abstract

Attention mechanism has become the dominant module in natural language processing models. It is computationally intensive and depends on massive power-hungry multiplications. In this paper, we rethink variants of attention mechanism from the energy consumption aspects. After reaching the conclusion that the energy costs of several energy-friendly operations are far less than their multiplication counterparts, we build a novel attention model by replacing multiplications with either selective operations or additions. Empirical results on three machine translation tasks demonstrate that the proposed model, against the vanilla one, achieves competitable accuracy while saving 99\% and 66\% energy during alignment calculation and the whole attention procedure. Code is available at: https://github.com/NLP2CT/E-Att.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nlp2ct/e-att
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Neural Network Applications · Advanced Graph Neural Networks