A Transformer with Stack Attention

Jiaoda Li; Jennifer C. White; Mrinmaya Sachan; Ryan Cotterell

arXiv:2405.04515·cs.CL·May 15, 2024

A Transformer with Stack Attention

Jiaoda Li, Jennifer C. White, Mrinmaya Sachan, Ryan Cotterell

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a stack-based attention mechanism for transformers, enhancing their ability to model certain context-free languages and adding interpretability, addressing limitations in their context sensitivity.

Contribution

The paper proposes a novel differentiable stack-based attention mechanism that can be integrated into transformers to improve modeling of context-free languages.

Findings

01

Enables transformers to model some deterministic context-free languages

02

Adds interpretability to transformer models

03

Limited to modeling some, but not all, context-free languages

Abstract

Natural languages are believed to be (mildly) context-sensitive. Despite underpinning remarkably capable large language models, transformers are unable to model many context-free language tasks. In an attempt to address this limitation in the modeling power of transformer-based language models, we propose augmenting them with a differentiable, stack-based attention mechanism. Our stack-based attention mechanism can be incorporated into any transformer-based language model and adds a level of interpretability to the model. We show that the addition of our stack-based attention mechanism enables the transformer to model some, but not all, deterministic context-free languages.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rycolab/stack-transformer
jaxOfficial

Videos

A Transformer with Stack Attention· underline

Taxonomy

TopicsSensor Technology and Measurement Systems