AraStance: A Multi-Country and Multi-Domain Dataset of Arabic Stance   Detection for Fact Checking

Tariq Alhindi; Amal Alabdulkarim; Ali Alshehri; Muhammad Abdul-Mageed; and Preslav Nakov

arXiv:2104.13559·cs.CL·May 19, 2021

AraStance: A Multi-Country and Multi-Domain Dataset of Arabic Stance Detection for Fact Checking

Tariq Alhindi, Amal Alabdulkarim, Ali Alshehri, Muhammad Abdul-Mageed, and Preslav Nakov

PDF

1 Repo 1 Datasets

TL;DR

This paper introduces AraStance, a comprehensive Arabic stance detection dataset from multiple countries and domains, and benchmarks it with BERT models, highlighting its challenges and potential for improving fact-checking systems.

Contribution

The paper presents AraStance, a new large-scale, multi-domain Arabic stance detection dataset, and provides baseline results with BERT models to advance Arabic fact-checking research.

Findings

01

Best BERT model achieves 85% accuracy

02

Dataset covers diverse domains and countries

03

Stance detection remains a challenging task

Abstract

With the continuing spread of misinformation and disinformation online, it is of increasing importance to develop combating mechanisms at scale in the form of automated systems that support multiple languages. One task of interest is claim veracity prediction, which can be addressed using stance detection with respect to relevant documents retrieved online. To this end, we present our new Arabic Stance Detection dataset (AraStance) of 4,063 claim--article pairs from a diverse set of sources comprising three fact-checking websites and one news website. AraStance covers false and true claims from multiple domains (e.g., politics, sports, health) and several Arab countries, and it is well-balanced between related and unrelated documents with respect to the claims. We benchmark AraStance, along with two other stance detection datasets, using a number of BERT-based models. Our best model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Tariq60/arastance
noneOfficial

Datasets

strombergnlp/ara-stance
dataset· 166 dl
166 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.