Energy-Gated Attention: Spectral Salience as an Inductive Bias for Transformer Attention

Athanasios Zeris

arXiv:2605.21842·cs.LG·May 22, 2026

Energy-Gated Attention: Spectral Salience as an Inductive Bias for Transformer Attention

Athanasios Zeris

PDF

TL;DR

This paper introduces Energy-Gated Attention (EGA), a simple spectral energy-based modification to transformer attention that improves performance by emphasizing tokens with higher informational content, validated on multiple datasets.

Contribution

The paper proposes EGA, a novel spectral energy gating mechanism for transformer attention, demonstrating its effectiveness and dataset independence, and exploring optimal wavelet bases for spectral analysis.

Findings

01

EGA improves validation loss on TinyShakespeare by +0.103 with minimal overhead.

02

EGA achieves similar improvements on Penn Treebank (+0.101).

03

Learned spectral energy thresholds align with linguistic properties of English text.

Abstract

Standard transformer attention computes pairwise similarity between queries and keys, treating all tokens as equally salient regardless of their intrinsic informational content. In turbulent fluid dynamics, coherent structures -- the energetically dominant, spatially organized patterns that persist amid background chaos -- carry a disproportionate fraction of total energy and govern all transport. We propose that tokens play an analogous role in transformer attention: informationally dense positions (morphological boundaries, syntactic heads, discourse markers) concentrate spectral energy and should attract proportionally more attention than background tokens (function words, repeated patterns, low-information filler). We propose Energy-Gated Attention (EGA): a simple modification that gates value aggregation by the spectral energy of key token embeddings, computed by a single learned…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.