Is this sentence valid? An Arabic Dataset for Commonsense Validation
Saja Tawalbeh, Mohammad AL-Smadi

TL;DR
This paper introduces the first Arabic dataset for commonsense validation, providing a benchmark for evaluating Arabic language models' understanding of commonsense in text.
Contribution
It presents a novel Arabic dataset for commonsense validation and baseline models, filling a gap in Arabic natural language understanding resources.
Findings
First Arabic dataset for commonsense validation
Baseline models trained on the dataset
Dataset available on GitHub
Abstract
The commonsense understanding and validation remains a challenging task in the field of natural language understanding. Therefore, several research papers have been published that studied the capability of proposed systems to evaluate the models ability to validate commonsense in text. In this paper, we present a benchmark Arabic dataset for commonsense understanding and validation as well as a baseline research and models trained using the same dataset. To the best of our knowledge, this dataset is considered as the first in the field of Arabic text commonsense validation. The dataset is distributed under the Creative Commons BY-SA 4.0 license and can be found on GitHub.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
