EcoTransformer: Attention without Multiplication

Xin Gao; Xingming Xu; Shirin Amiraslani; Hong Xu

arXiv:2507.20096·cs.LG·August 7, 2025

EcoTransformer: Attention without Multiplication

Xin Gao, Xingming Xu, Shirin Amiraslani, Hong Xu

PDF

TL;DR

EcoTransformer introduces a convolution-based attention mechanism that eliminates matrix multiplication, reducing energy consumption while maintaining or improving performance across NLP, bioinformatics, and vision tasks.

Contribution

It proposes a novel attention method using Laplacian kernel convolution with L1 distance, avoiding multiplication and lowering energy costs.

Findings

01

Performs on par or better than traditional attention in multiple tasks

02

Consumes significantly less energy

03

Supersedes previous version with improved design

Abstract

The Transformer, with its scaled dot-product attention mechanism, has become a foundational architecture in modern AI. However, this mechanism is computationally intensive and incurs substantial energy costs. We propose a new Transformer architecture EcoTransformer, in which the output context vector is constructed as the convolution of the values using a Laplacian kernel, where the distances are measured by the L1 metric between the queries and keys. Compared to dot-product based attention, the new attention score calculation is free of matrix multiplication. It performs on par with, or even surpasses, scaled dot-product attention in NLP, bioinformatics, and vision tasks, while consuming significantly less energy. (This version (v2) supersedes v1 and reflects the intended release and licensing.)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.