Zero-Shot Commonsense Validation and Reasoning with Large Language   Models: An Evaluation on SemEval-2020 Task 4 Dataset

Rawand Alfugaha; Mohammad AL-Smadi

arXiv:2502.15810·cs.CL·February 25, 2025

Zero-Shot Commonsense Validation and Reasoning with Large Language Models: An Evaluation on SemEval-2020 Task 4 Dataset

Rawand Alfugaha, Mohammad AL-Smadi

PDF

Open Access

TL;DR

This paper evaluates large language models on SemEval-2020 tasks for commonsense validation and reasoning, showing larger models perform well but still struggle with explanation relevance and causal inference.

Contribution

It provides a comprehensive zero-shot evaluation of multiple LLMs on commonsense tasks, highlighting their strengths and limitations compared to fine-tuned models.

Findings

01

LLaMA3-70B achieves 98.40% accuracy in validation.

02

Models outperform previous baselines in validation but lag in explanation tasks.

03

Challenges remain in selecting relevant explanations and causal reasoning.

Abstract

This study evaluates the performance of Large Language Models (LLMs) on SemEval-2020 Task 4 dataset, focusing on commonsense validation and explanation. Our methodology involves evaluating multiple LLMs, including LLaMA3-70B, Gemma2-9B, and Mixtral-8x7B, using zero-shot prompting techniques. The models are tested on two tasks: Task A (Commonsense Validation), where models determine whether a statement aligns with commonsense knowledge, and Task B (Commonsense Explanation), where models identify the reasoning behind implausible statements. Performance is assessed based on accuracy, and results are compared to fine-tuned transformer-based models. The results indicate that larger models outperform previous models and perform closely to human evaluation for Task A, with LLaMA3-70B achieving the highest accuracy of 98.40% in Task A whereas, lagging behind previous models with 93.40% in Task…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling