COM2SENSE: A Commonsense Reasoning Benchmark with Complementary Sentences
Shikhar Singh, Nuan Wen, Yu Hou, Pegah Alipoormolabashi, Te-Lin Wu,, Xuezhe Ma, Nanyun Peng

TL;DR
COM2SENSE introduces a new challenging benchmark dataset with complementary sentence pairs to evaluate and analyze AI's commonsense reasoning abilities across various knowledge domains and scenarios.
Contribution
The paper presents a novel dataset with complementary sentence pairs, a pairwise accuracy metric, and an adversarial setup to better assess and understand AI commonsense reasoning.
Findings
Baseline models perform significantly below human accuracy.
The dataset reveals gaps in current AI commonsense reasoning capabilities.
Complementary sentence pairs challenge models to improve reasoning robustness.
Abstract
Commonsense reasoning is intuitive for humans but has been a long-term challenge for artificial intelligence (AI). Recent advancements in pretrained language models have shown promising results on several commonsense benchmark datasets. However, the reliability and comprehensiveness of these benchmarks towards assessing model's commonsense reasoning ability remains unclear. To this end, we introduce a new commonsense reasoning benchmark dataset comprising natural language true/false statements, with each sample paired with its complementary counterpart, resulting in 4k sentence pairs. We propose a pairwise accuracy metric to reliably measure an agent's ability to perform commonsense reasoning over a given situation. The dataset is crowdsourced and enhanced with an adversarial model-in-the-loop setup to incentivize challenging samples. To facilitate a systematic analysis of commonsense…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
