A Transformer with Stack Attention
Jiaoda Li, Jennifer C. White, Mrinmaya Sachan, Ryan Cotterell

TL;DR
This paper introduces a stack-based attention mechanism for transformers, enhancing their ability to model certain context-free languages and adding interpretability, addressing limitations in their context sensitivity.
Contribution
The paper proposes a novel differentiable stack-based attention mechanism that can be integrated into transformers to improve modeling of context-free languages.
Findings
Enables transformers to model some deterministic context-free languages
Adds interpretability to transformer models
Limited to modeling some, but not all, context-free languages
Abstract
Natural languages are believed to be (mildly) context-sensitive. Despite underpinning remarkably capable large language models, transformers are unable to model many context-free language tasks. In an attempt to address this limitation in the modeling power of transformer-based language models, we propose augmenting them with a differentiable, stack-based attention mechanism. Our stack-based attention mechanism can be incorporated into any transformer-based language model and adds a level of interpretability to the model. We show that the addition of our stack-based attention mechanism enables the transformer to model some, but not all, deterministic context-free languages.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSensor Technology and Measurement Systems
