ThatiAR: Subjectivity Detection in Arabic News Sentences

Reem Suwaileh; Maram Hasanain; Fatema Hubail; Wajdi Zaghouani; Firoj; Alam

arXiv:2406.05559·cs.CL·June 11, 2024

ThatiAR: Subjectivity Detection in Arabic News Sentences

Reem Suwaileh, Maram Hasanain, Fatema Hubail, Wajdi Zaghouani, Firoj, Alam

PDF

Open Access 1 Datasets

TL;DR

This paper introduces the first large Arabic dataset for subjectivity detection in news sentences, analyzes annotation challenges, and benchmarks various language models, highlighting the effectiveness of LLMs with in-context learning.

Contribution

It provides a new Arabic dataset for subjectivity detection, includes detailed analysis of annotation influences, and evaluates multiple models, emphasizing LLMs' superior performance.

Findings

01

LLMs with in-context learning outperform other models

02

Annotators' backgrounds significantly influence annotation quality

03

The dataset facilitates future research in Arabic NLP

Abstract

Detecting subjectivity in news sentences is crucial for identifying media bias, enhancing credibility, and combating misinformation by flagging opinion-based content. It provides insights into public sentiment, empowers readers to make informed decisions, and encourages critical thinking. While research has developed methods and systems for this purpose, most efforts have focused on English and other high-resourced languages. In this study, we present the first large dataset for subjectivity detection in Arabic, consisting of ~3.6K manually annotated sentences, and GPT-4o based explanation. In addition, we included instructions (both in English and Arabic) to facilitate LLM based fine-tuning. We provide an in-depth analysis of the dataset, annotation process, and extensive benchmark results, including PLMs and LLMs. Our analysis of the annotation process highlights that annotators were…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

AIWizards/clef2025_checkthat_task1_subjectivity
dataset· 318 dl
318 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Sentiment Analysis and Opinion Mining