Learning Hierarchical Structures with Differentiable Nondeterministic Stacks
Brian DuSell, David Chiang

TL;DR
This paper introduces improvements to the Nondeterministic Stack RNN, enabling better learning of hierarchical structures in sequential data by using unnormalized weights and direct PDA state observation, achieving near-optimal performance on language modeling tasks.
Contribution
The paper advances the NS-RNN model by replacing probabilities with unnormalized weights and allowing direct PDA state observation, enhancing its ability to learn hierarchical structures.
Findings
Achieves lower cross-entropy than previous stack RNNs on five language modeling tasks.
Performs within 0.05 nats of the information-theoretic lower bound.
Successfully models hierarchical structures in long sequences.
Abstract
Learning hierarchical structures in sequential data -- from simple algorithmic patterns to natural language -- in a reliable, generalizable way remains a challenging problem for neural language models. Past work has shown that recurrent neural networks (RNNs) struggle to generalize on held-out algorithmic or syntactic patterns without supervision or some inductive bias. To remedy this, many papers have explored augmenting RNNs with various differentiable stacks, by analogy with finite automata and pushdown automata (PDAs). In this paper, we improve the performance of our recently proposed Nondeterministic Stack RNN (NS-RNN), which uses a differentiable data structure that simulates a nondeterministic PDA, with two important changes. First, the model now assigns unnormalized positive weights instead of probabilities to stack actions, and we provide an analysis of why this improves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
