PIQA: Reasoning about Physical Commonsense in Natural Language

Yonatan Bisk; Rowan Zellers; Ronan Le Bras; Jianfeng Gao; Yejin Choi

arXiv:1911.11641·cs.CL·November 27, 2019

PIQA: Reasoning about Physical Commonsense in Natural Language

Yonatan Bisk, Rowan Zellers, Ronan Le Bras, Jianfeng Gao, Yejin Choi

PDF

2 Repos 10 Models 5 Datasets

TL;DR

This paper introduces PIQA, a benchmark dataset for evaluating AI's ability to reason about physical commonsense in natural language, highlighting current models' limitations and opportunities for future improvement.

Contribution

The paper presents a new dataset and task for physical commonsense reasoning, revealing the gap between human and AI performance and analyzing the knowledge deficiencies of existing models.

Findings

01

Humans achieve 95% accuracy on PIQA.

02

Pretrained models achieve only 77% accuracy.

03

Existing models lack key physical commonsense knowledge.

Abstract

To apply eyeshadow without a brush, should I use a cotton swab or a toothpick? Questions requiring this kind of physical commonsense pose a challenge to today's natural language understanding systems. While recent pretrained models (such as BERT) have made progress on question answering over more abstract domains - such as news articles and encyclopedia entries, where text is plentiful - in more physical domains, text is inherently limited due to reporting bias. Can AI systems learn to reliably answer physical common-sense questions without experiencing the physical world? In this paper, we introduce the task of physical commonsense reasoning and a corresponding benchmark dataset Physical Interaction: Question Answering or PIQA. Though humans find the dataset easy (95% accuracy), large pretrained models struggle (77%). We provide analysis about the dimensions of knowledge that existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.