Detection and Analysis of Offensive Online Content in Hausa Language

Fatima Muhammad Adam; Abubakar Yakubu Zandam; Isa Inuwa-Dutse

arXiv:2311.10541·cs.CL·March 10, 2025·1 cites

Detection and Analysis of Offensive Online Content in Hausa Language

Fatima Muhammad Adam, Abubakar Yakubu Zandam, Isa Inuwa-Dutse

PDF

Open Access

TL;DR

This paper addresses the challenge of detecting offensive online content in Hausa, a low-resource language, by creating a new dataset and developing detection systems that outperform baseline models, emphasizing cultural and linguistic nuances.

Contribution

The study introduces the first offensive term dataset for Hausa and develops detection models that better capture linguistic nuances compared to multilingual baselines.

Findings

01

Detection system identified over 70% of offensive content.

02

Baseline models often mistranslated offensive terms.

03

Offensive language is prevalent in discussions on religion and politics.

Abstract

Hausa, a major Chadic language spoken by over 100 million people mostly in West Africa is considered a low-resource language from a computational linguistic perspective. This classification indicates a scarcity of linguistic resources and tools necessary for handling various natural language processing (NLP) tasks, including the detection of offensive content. To address this gap, we conducted two set of studies (1) a user study (n=101) to explore cyberbullying in Hausa and (2) an empirical study that led to the creation of the first dataset of offensive terms in the Hausa language. We developed detection systems trained on this dataset and compared their performance against relevant multilingual models, including Google Translate. Our detection system successfully identified over 70% of offensive, whereas baseline models frequently mistranslated such terms. We attribute this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection

MethodsSparse Evolutionary Training