Understanding How Encoder-Decoder Architectures Attend
Kyle Aitken, Vinay V Ramasesh, Yuan Cao, Niru Maheswaranathan

TL;DR
This paper investigates the internal mechanisms of attention in encoder-decoder networks, revealing how they form attention matrices by decomposing hidden states into temporal and input-driven components across different architectures.
Contribution
It introduces a novel decomposition method for hidden states that clarifies how attention matrices are generated in various encoder-decoder architectures.
Findings
Attention matrices depend on task requirements, emphasizing either temporal or input-driven components.
The decomposition method applies consistently across recurrent and feed-forward architectures.
Provides new insights into the inner workings of attention mechanisms in sequence-to-sequence models.
Abstract
Encoder-decoder networks with attention have proven to be a powerful way to solve many sequence-to-sequence tasks. In these networks, attention aligns encoder and decoder states and is often used for visualizing network behavior. However, the mechanisms used by networks to generate appropriate attention matrices are still mysterious. Moreover, how these mechanisms vary depending on the particular architecture used for the encoder and decoder (recurrent, feed-forward, etc.) are also not well understood. In this work, we investigate how encoder-decoder networks solve different sequence-to-sequence tasks. We introduce a way of decomposing hidden states over a sequence into temporal (independent of input) and input-driven (independent of sequence position) components. This reveals how attention matrices are formed: depending on the task requirements, networks rely more heavily on either the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsEEG and Brain-Computer Interfaces · Neural dynamics and brain function · Neural and Behavioral Psychology Studies
