Cohesion-6K: An Arabic Dataset for Analyzing Social Cohesion and Conflict in Online Discourse
Aisha Ali Al-Athba, Wajdi Zaghouani

TL;DR
Cohesion-6K is a new Arabic Facebook dataset with annotated discourse categories from conflict to cohesion, enabling analysis of social cohesion and polarization in online discourse.
Contribution
The paper introduces a manually and ChatGPT-assisted annotated dataset of 6,000 Arabic posts with discourse labels, enhancing computational social science research.
Findings
Conflict posts attract 2-4 times more engagement than resolution posts
Annotation process achieved Cohen's kappa of 0.85 indicating high agreement
Dataset supports future research in social cohesion, polarization, and Arabic NLP
Abstract
The study of online discourse has become central to understanding societal polarization. While much research has focused on detecting overt toxicity, the subtle dynamics of social cohesion, meaning the interaction between divisive and unifying narratives, remain computationally underexplored (Bail, 2021; Gonzalez-Bailon and Lelkes, 2023). This paper presents Cohesion-6K, a manually and ChatGPT-assisted annotated dataset of six thousand Arabic public Facebook posts related to the Israeli Occupation of Palestine. Each post is assigned to one of five discourse categories that represent a continuum from conflict to cohesion: Conflict, Resolution, Community Engagement, Supportive Interactions, and Shared Values. The annotation process combines expert human judgment with model-assisted pre-labeling verified by trained annotators, achieving substantial inter-annotator agreement (Cohens kappa =…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
