Detection and Analysis of Offensive Online Content in Hausa Language
Fatima Muhammad Adam, Abubakar Yakubu Zandam, Isa Inuwa-Dutse

TL;DR
This paper addresses the challenge of detecting offensive online content in Hausa, a low-resource language, by creating a new dataset and developing detection systems that outperform baseline models, emphasizing cultural and linguistic nuances.
Contribution
The study introduces the first offensive term dataset for Hausa and develops detection models that better capture linguistic nuances compared to multilingual baselines.
Findings
Detection system identified over 70% of offensive content.
Baseline models often mistranslated offensive terms.
Offensive language is prevalent in discussions on religion and politics.
Abstract
Hausa, a major Chadic language spoken by over 100 million people mostly in West Africa is considered a low-resource language from a computational linguistic perspective. This classification indicates a scarcity of linguistic resources and tools necessary for handling various natural language processing (NLP) tasks, including the detection of offensive content. To address this gap, we conducted two set of studies (1) a user study (n=101) to explore cyberbullying in Hausa and (2) an empirical study that led to the creation of the first dataset of offensive terms in the Hausa language. We developed detection systems trained on this dataset and compared their performance against relevant multilingual models, including Google Translate. Our detection system successfully identified over 70% of offensive, whereas baseline models frequently mistranslated such terms. We attribute this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
MethodsSparse Evolutionary Training
