COM2SENSE: A Commonsense Reasoning Benchmark with Complementary   Sentences

Shikhar Singh; Nuan Wen; Yu Hou; Pegah Alipoormolabashi; Te-Lin Wu,; Xuezhe Ma; Nanyun Peng

arXiv:2106.00969·cs.CL·June 3, 2021

COM2SENSE: A Commonsense Reasoning Benchmark with Complementary Sentences

Shikhar Singh, Nuan Wen, Yu Hou, Pegah Alipoormolabashi, Te-Lin Wu,, Xuezhe Ma, Nanyun Peng

PDF

Open Access 1 Repo

TL;DR

COM2SENSE introduces a new challenging benchmark dataset with complementary sentence pairs to evaluate and analyze AI's commonsense reasoning abilities across various knowledge domains and scenarios.

Contribution

The paper presents a novel dataset with complementary sentence pairs, a pairwise accuracy metric, and an adversarial setup to better assess and understand AI commonsense reasoning.

Findings

01

Baseline models perform significantly below human accuracy.

02

The dataset reveals gaps in current AI commonsense reasoning capabilities.

03

Complementary sentence pairs challenge models to improve reasoning robustness.

Abstract

Commonsense reasoning is intuitive for humans but has been a long-term challenge for artificial intelligence (AI). Recent advancements in pretrained language models have shown promising results on several commonsense benchmark datasets. However, the reliability and comprehensiveness of these benchmarks towards assessing model's commonsense reasoning ability remains unclear. To this end, we introduce a new commonsense reasoning benchmark dataset comprising natural language true/false statements, with each sample paired with its complementary counterpart, resulting in 4k sentence pairs. We propose a pairwise accuracy metric to reliably measure an agent's ability to perform commonsense reasoning over a given situation. The dataset is crowdsourced and enhanced with an adversarial model-in-the-loop setup to incentivize challenging samples. To facilitate a systematic analysis of commonsense…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PlusLabNLP/Com2Sense
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications