ORCHARD: A Benchmark For Measuring Systematic Generalization of   Multi-Hierarchical Reasoning

Bill Tuck Weng Pung; Alvin Chan

arXiv:2111.14034·cs.CL·November 30, 2021

ORCHARD: A Benchmark For Measuring Systematic Generalization of Multi-Hierarchical Reasoning

Bill Tuck Weng Pung, Alvin Chan

PDF

Open Access 1 Repo

TL;DR

This paper introduces ORCHARD, a diagnostic dataset to evaluate how well state-of-the-art models like Transformers and LSTMs can generalize in hierarchical reasoning tasks involving multiple structures, revealing significant limitations.

Contribution

It presents a novel benchmark for assessing systematic generalization in multi-hierarchical reasoning, highlighting the models' failure to handle complex hierarchical reasoning tasks.

Findings

01

Transformers and LSTMs fail to generalize systematically in hierarchical reasoning.

02

Increased references between hierarchies do not improve Transformer performance.

03

Models struggle with reasoning involving multiple explicit hierarchical structures.

Abstract

The ability to reason with multiple hierarchical structures is an attractive and desirable property of sequential inductive biases for natural language processing. Do the state-of-the-art Transformers and LSTM architectures implicitly encode for these biases? To answer this, we propose ORCHARD, a diagnostic dataset for systematically evaluating hierarchical reasoning in state-of-the-art neural sequence models. While there have been prior evaluation frameworks such as ListOps or Logical Inference, our work presents a novel and more natural setting where our models learn to reason with multiple explicit hierarchical structures instead of only one, i.e., requiring the ability to do both long-term sequence memorizing, relational reasoning while reasoning with hierarchical structure. Consequently, backed by a set of rigorous experiments, we show that (1) Transformer and LSTM models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

billptw/orchard
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Sigmoid Activation · Label Smoothing · Softmax · Residual Connection · Layer Normalization · Adam