Differentiable Tree Operations Promote Compositional Generalization
Paul Soulos, Edward Hu, Kate McCurdy, Yunmo Chen, Roland Fernandez,, Paul Smolensky, Jianfeng Gao

TL;DR
This paper introduces a differentiable tree interpreter and a novel architecture called Differentiable Tree Machine (DTM) that significantly improves compositional generalization in structure-to-structure transformation tasks, achieving perfect accuracy on synthetic benchmarks.
Contribution
The paper proposes a differentiable tree interpreter and DTM architecture that enable end-to-end learning of symbolic tree operations, enhancing compositional generalization.
Findings
DTM achieves 100% accuracy on out-of-distribution tasks.
Outperforms Transformer, LSTM, and other baselines significantly.
Maintains high interpretability while delivering perfect performance.
Abstract
In the context of structure-to-structure transformation tasks, learning sequences of discrete symbolic operations poses significant challenges due to their non-differentiability. To facilitate the learning of these symbolic sequences, we introduce a differentiable tree interpreter that compiles high-level symbolic tree operations into subsymbolic matrix operations on tensors. We present a novel Differentiable Tree Machine (DTM) architecture that integrates our interpreter with an external memory and an agent that learns to sequentially select tree operations to execute the target transformation in an end-to-end manner. With respect to out-of-distribution compositional generalization on synthetic semantic parsing and language generation tasks, DTM achieves 100% while existing baselines such as Transformer, Tree Transformer, LSTM, and Tree2Tree LSTM achieve less than 30%. DTM remains…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Network Packet Processing and Optimization
MethodsMulti-Head Attention · Attention Is All You Need · Residual Connection · Linear Layer · Layer Normalization · Byte Pair Encoding · Softmax · Label Smoothing · Adam · Absolute Position Encodings
