Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention   Formulation

Itamar Zimerman; Ameen Ali; Lior Wolf

arXiv:2405.16504·cs.LG·October 21, 2024·1 cites

Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation

Itamar Zimerman, Ameen Ali, Lior Wolf

PDF

Open Access 1 Repo

TL;DR

This paper presents a unified implicit attention framework for modern gated-linear RNNs, enhancing explainability and demonstrating competitive results with state-of-the-art methods in sequence modeling.

Contribution

It introduces a unified implicit attention formulation for gated RNNs, enabling better explainability and comparison across models.

Findings

01

Attention matrices and attribution methods outperform previous formulations.

02

The framework is effective and competitive with Transformer explainability methods.

03

The approach applies broadly to various gated RNN architectures.

Abstract

Recent advances in efficient sequence modeling have led to attention-free layers, such as Mamba, RWKV, and various gated RNNs, all featuring sub-quadratic complexity in sequence length and excellent scaling properties, enabling the construction of a new type of foundation models. In this paper, we present a unified view of these models, formulating such layers as implicit causal self-attention layers. The formulation includes most of their sub-components and is not limited to a specific part of the architecture. The framework compares the underlying mechanisms on similar grounds for different layers and provides a direct means for applying explainability methods. Our experiments show that our attention matrices and attribution method outperform an alternative and a more limited formulation that was recently proposed for Mamba. For the other architectures for which our method is the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Itamarzimm/UnifiedImplicitAttnRepr
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Speech Recognition and Synthesis

MethodsAttention Is All You Need · Dense Connections · Layer Normalization · Residual Connection · Position-Wise Feed-Forward Layer · Adam · Linear Layer · Softmax · Multi-Head Attention · Dropout