Evaluating Commonsense in Pre-trained Language Models
Xuhui Zhou, Yue Zhang, Leyang Cui, Dandan Huang

TL;DR
This paper evaluates the commonsense reasoning abilities of prominent pre-trained language models using seven benchmarks, revealing their surface-level understanding and highlighting areas for improvement in inference and robustness.
Contribution
It introduces a comprehensive evaluation of commonsense in models like GPT, BERT, XLNet, and RoBERTa, and releases the CATs benchmark dataset for future research.
Findings
Models perform well with language modeling objectives and larger datasets.
Current models struggle with inference-heavy tasks.
Models show confusion on correlated test cases, indicating surface-level commonsense learning.
Abstract
Contextualized representations trained over large raw text data have given remarkable improvements for NLP tasks including question answering and reading comprehension. There have been works showing that syntactic, semantic and word sense knowledge are contained in such representations, which explains why they benefit such tasks. However, relatively little work has been done investigating commonsense knowledge contained in contextualized representations, which is crucial for human question answering and reading comprehension. We study the commonsense ability of GPT, BERT, XLNet, and RoBERTa by testing them on seven challenging benchmarks, finding that language modeling and its variants are effective objectives for promoting models' commonsense ability while bi-directional context and larger training set are bonuses. We additionally find that current models do poorly on tasks require…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsTest · Linear Layer · Cosine Annealing · Discriminative Fine-Tuning · Linear Warmup With Cosine Annealing · SentencePiece · Byte Pair Encoding · GPT · XLNet · Adam
