Reasoning about Goals, Steps, and Temporal Ordering with WikiHow

Li Zhang; Qing Lyu; Chris Callison-Burch

arXiv:2009.07690·cs.CL·December 15, 2020

Reasoning about Goals, Steps, and Temporal Ordering with WikiHow

Li Zhang, Qing Lyu, Chris Callison-Burch

PDF

1 Repo

TL;DR

This paper introduces a new dataset and reasoning tasks based on wikiHow to evaluate models' understanding of goal-step and temporal relations in procedural events, highlighting a significant gap between AI and human performance.

Contribution

It presents a novel dataset and benchmark for reasoning about procedural goal and step relations, enabling better evaluation of commonsense inference in AI models.

Findings

01

Transformer models lag behind humans by 10-20% on the benchmark.

02

Models trained on the dataset transfer effectively to out-of-domain tasks.

03

Significant improvements in zero- and few-shot learning on related benchmarks.

Abstract

We propose a suite of reasoning tasks on two types of relations between procedural events: goal-step relations ("learn poses" is a step in the larger goal of "doing yoga") and step-step temporal relations ("buy a yoga mat" typically precedes "learn poses"). We introduce a dataset targeting these two relations based on wikiHow, a website of instructional how-to articles. Our human-validated test set serves as a reliable benchmark for commonsense inference, with a gap of about 10% to 20% between the performance of state-of-the-art transformer models and human performance. Our automatically-generated training set allows models to effectively transfer to out-of-domain tasks requiring knowledge of procedural events, with greatly improved performances on SWAG, Snips, and the Story Cloze Test in zero- and few-shot settings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zharry29/wikihow-goal-step
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.