Compositional Generalization with Tree Stack Memory Units
Forough Arabshahi, Zhichu Lu, Pranay Mundra, Sameer Singh, Animashree, Anandkumar

TL;DR
This paper introduces Tree Stack Memory Units (Tree-SMU), a recursive neural network with differentiable stack memory, designed to improve compositional generalization in neural networks, especially for mathematical reasoning tasks.
Contribution
The paper proposes Tree-SMU, a novel recursive neural network with stack memory units, enhancing zero-shot compositional generalization over existing models like Transformers and Tree-LSTMs.
Findings
Tree-SMU outperforms baselines on mathematical reasoning benchmarks.
It demonstrates strong generalization on compositionality tests.
Stack memory captures long-range dependencies effectively.
Abstract
We study compositional generalization, viz., the problem of zero-shot generalization to novel compositions of concepts in a domain. Standard neural networks fail to a large extent on compositional learning. We propose Tree Stack Memory Units (Tree-SMU) to enable strong compositional generalization. Tree-SMU is a recursive neural network with Stack Memory Units (\SMU s), a novel memory augmented neural network whose memory has a differentiable stack structure. Each SMU in the tree architecture learns to read from its stack and to write to it by combining the stacks and states of its children through gating. The stack helps capture long-range dependencies in the problem domain, thereby enabling compositional generalization. Additionally, the stack also preserves the ordering of each node's descendants, thereby retaining locality on the tree. We demonstrate strong empirical results on two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Semantic Web and Ontologies
