JobArabi: An Arabic Corpus and Analysis of Job Announcements from Social Media
Wajdi Zaghouani, Shimaa Amer Ibrahim, Mabrouka Bessghaier, Houda Bouamor

TL;DR
JobArabi is a comprehensive Arabic social media corpus of job announcements enabling sociolinguistic and labor market analysis, with detailed metadata and linguistic annotations.
Contribution
It introduces a large-scale, annotated Arabic job announcement corpus from social media, facilitating research in NLP, social science, and digital labor studies.
Findings
Gendered language persists in online recruitment.
Regional variation affects occupational demand.
Recruitment messages vary emotionally across regions.
Abstract
This paper introduces JobArabi, a large-scale corpus of Arabic job announcements collected from social media between January 2024 and October 2025. The dataset contains 20,528 public posts from X and captures more than two years of employment-related discourse across Arabic-speaking online communities. The corpus was compiled using a linguistically informed query framework covering 21 Arabic keyword families that reflect gendered, plural, formal, and dialectal expressions of recruitment language. The resulting dataset includes posts from institutional, commercial, and individual accounts and provides metadata such as timestamps, engagement indicators, and geolocation when available, enabling temporal and regional analysis of employment discourse. Quantitative analysis reveals several sociolinguistic patterns in online recruitment, including the persistence of gendered hiring language,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
