ORCHARD: A Benchmark For Measuring Systematic Generalization of Multi-Hierarchical Reasoning
Bill Tuck Weng Pung, Alvin Chan

TL;DR
This paper introduces ORCHARD, a diagnostic dataset to evaluate how well state-of-the-art models like Transformers and LSTMs can generalize in hierarchical reasoning tasks involving multiple structures, revealing significant limitations.
Contribution
It presents a novel benchmark for assessing systematic generalization in multi-hierarchical reasoning, highlighting the models' failure to handle complex hierarchical reasoning tasks.
Findings
Transformers and LSTMs fail to generalize systematically in hierarchical reasoning.
Increased references between hierarchies do not improve Transformer performance.
Models struggle with reasoning involving multiple explicit hierarchical structures.
Abstract
The ability to reason with multiple hierarchical structures is an attractive and desirable property of sequential inductive biases for natural language processing. Do the state-of-the-art Transformers and LSTM architectures implicitly encode for these biases? To answer this, we propose ORCHARD, a diagnostic dataset for systematically evaluating hierarchical reasoning in state-of-the-art neural sequence models. While there have been prior evaluation frameworks such as ListOps or Logical Inference, our work presents a novel and more natural setting where our models learn to reason with multiple explicit hierarchical structures instead of only one, i.e., requiring the ability to do both long-term sequence memorizing, relational reasoning while reasoning with hierarchical structure. Consequently, backed by a set of rigorous experiments, we show that (1) Transformer and LSTM models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Sigmoid Activation · Label Smoothing · Softmax · Residual Connection · Layer Normalization · Adam
