Evaluating Commonsense in Pre-trained Language Models

Xuhui Zhou; Yue Zhang; Leyang Cui; Dandan Huang

arXiv:1911.11931·cs.CL·February 12, 2021·27 cites

Evaluating Commonsense in Pre-trained Language Models

Xuhui Zhou, Yue Zhang, Leyang Cui, Dandan Huang

PDF

Open Access 1 Repo

TL;DR

This paper evaluates the commonsense reasoning abilities of prominent pre-trained language models using seven benchmarks, revealing their surface-level understanding and highlighting areas for improvement in inference and robustness.

Contribution

It introduces a comprehensive evaluation of commonsense in models like GPT, BERT, XLNet, and RoBERTa, and releases the CATs benchmark dataset for future research.

Findings

01

Models perform well with language modeling objectives and larger datasets.

02

Current models struggle with inference-heavy tasks.

03

Models show confusion on correlated test cases, indicating surface-level commonsense learning.

Abstract

Contextualized representations trained over large raw text data have given remarkable improvements for NLP tasks including question answering and reading comprehension. There have been works showing that syntactic, semantic and word sense knowledge are contained in such representations, which explains why they benefit such tasks. However, relatively little work has been done investigating commonsense knowledge contained in contextualized representations, which is crucial for human question answering and reading comprehension. We study the commonsense ability of GPT, BERT, XLNet, and RoBERTa by testing them on seven challenging benchmarks, finding that language modeling and its variants are effective objectives for promoting models' commonsense ability while bi-directional context and larger training set are bonuses. We additionally find that current models do poorly on tasks require…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

XuhuiZhou/CATS
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsTest · Linear Layer · Cosine Annealing · Discriminative Fine-Tuning · Linear Warmup With Cosine Annealing · SentencePiece · Byte Pair Encoding · GPT · XLNet · Adam