StackTrans: From Large Language Model to Large Pushdown Automata Model
Kechi Zhang, Ge Li, Jia Li, Huangzhao Zhang, Yihong Dong, Jia Li, Jingjing Xu, Zhi Jin

TL;DR
StackTrans introduces a novel Transformer extension that explicitly incorporates stack operations inspired by pushdown automata, enabling better modeling of context-free grammars and improving performance on language tasks.
Contribution
It proposes a new architecture, StackTrans, that integrates differentiable stack operations into Transformers, enhancing their ability to handle context-free structures and outperform existing models.
Findings
Outperforms standard Transformers on Chomsky hierarchy benchmarks
Scales from 360M to 7B parameters with consistent improvements
Pretrained StackTrans-360M surpasses larger open-source LLMs
Abstract
The Transformer architecture has emerged as a landmark advancement within the broad field of artificial intelligence, effectively catalyzing the advent of large language models (LLMs). However, despite its remarkable capabilities and the substantial progress it has facilitated, the Transformer architecture still has some limitations. One such intrinsic limitation is its inability to effectively capture the Chomsky hierarchy, such as regular expressions or deterministic context-free grammars. Drawing inspiration from pushdown automata, which efficiently resolve deterministic context-free grammars using stacks, we propose StackTrans to address the aforementioned issue within LLMs. Unlike previous approaches that modify the attention computation, StackTrans explicitly incorporates hidden state stacks between Transformer layers. This design maintains compatibility with existing frameworks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Machine Learning and Algorithms · Topic Modeling
