Algerian Dialect

Zakaria Benmounah; Abdennour Boulesnane

arXiv:2512.19543·cs.CL·December 23, 2025

Algerian Dialect

Zakaria Benmounah, Abdennour Boulesnane

PDF

Open Access

TL;DR

This paper introduces a large, manually annotated sentiment dataset of 45,000 Algerian Arabic YouTube comments, facilitating research in dialectal NLP and social media analysis.

Contribution

It provides the first extensive sentiment-annotated dataset for Algerian dialect, including rich metadata, to support NLP research in under-resourced dialects.

Findings

01

Dataset contains 45,000 comments with sentiment labels.

02

Includes metadata like timestamps, likes, and URLs.

03

Publicly available under CC BY 4.0 license.

Abstract

We present Algerian Dialect, a large-scale sentiment-annotated dataset consisting of 45,000 YouTube comments written in Algerian Arabic dialect. The comments were collected from more than 30 Algerian press and media channels using the YouTube Data API. Each comment is manually annotated into one of five sentiment categories: very negative, negative, neutral, positive, and very positive. In addition to sentiment labels, the dataset includes rich metadata such as collection timestamps, like counts, video URLs, and annotation dates. This dataset addresses the scarcity of publicly available resources for Algerian dialect and aims to support research in sentiment analysis, dialectal Arabic NLP, and social media analytics. The dataset is publicly available on Mendeley Data under a CC BY 4.0 license at https://doi.org/10.17632/zzwg3nnhsz.2.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLinguistic Variation and Morphology · Language, Linguistics, Cultural Analysis · Authorship Attribution and Profiling