Evaluate Confidence Instead of Perplexity for Zero-shot Commonsense Reasoning
Letian Peng, Zuchao Li, Hai Zhao

TL;DR
This paper introduces Non-Replacement Confidence (NRC), a new metric for zero-shot commonsense reasoning that outperforms traditional perplexity-based methods by better capturing contextual integrity in language models.
Contribution
It proposes NRC, a novel confidence measure based on ELECTRA's RTD objective, improving zero-shot commonsense reasoning performance over perplexity metrics.
Findings
NRC significantly improves zero-shot reasoning accuracy.
RTD-based PLMs possess essential commonsense knowledge.
NRC outperforms perplexity on multiple benchmarks.
Abstract
Commonsense reasoning is an appealing topic in natural language processing (NLP) as it plays a fundamental role in supporting the human-like actions of NLP systems. With large-scale language models as the backbone, unsupervised pre-training on numerous corpora shows the potential to capture commonsense knowledge. Current pre-trained language model (PLM)-based reasoning follows the traditional practice using perplexity metric. However, commonsense reasoning is more than existing probability evaluation, which is biased by word frequency. This paper reconsiders the nature of commonsense reasoning and proposes a novel commonsense reasoning metric, Non-Replacement Confidence (NRC). In detail, it works on PLMs according to the Replaced Token Detection (RTD) pre-training objective in ELECTRA, in which the corruption detection objective reflects the confidence on contextual integrity that is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Residual Connection · WordPiece · Linear Warmup With Linear Decay · Attention Dropout · Dropout · Softmax
