Text Detoxification in isiXhosa and Yor\`ub\'a: A Cross-Lingual Machine Learning Approach for Low-Resource African Languages
Abayomi O. Agbeyangi

TL;DR
This paper presents a hybrid machine learning approach for automatic text detoxification in isiXhosa and Yor extbackslash'uba, addressing the scarcity of toxicity mitigation tools for low-resource African languages.
Contribution
It introduces a novel, interpretable hybrid model combining TF-IDF, Logistic Regression, and rule-based rewriting for toxicity detection and neutralization in low-resource languages.
Findings
Detection accuracy up to 86% ROC-AUC in Yor extbackslash'uba
Successful detoxification of all toxic sentences
Developed a parallel corpus capturing idiomatic and code-switching usage
Abstract
Toxic language is one of the major barrier to safe online participation, yet robust mitigation tools are scarce for African languages. This study addresses this critical gap by investigating automatic text detoxification (toxic to neutral rewriting) for two low-resource African languages, isiXhosa and Yor\`ub\'a. The work contributes a novel, pragmatic hybrid methodology: a lightweight, interpretable TF-IDF and Logistic Regression model for transparent toxicity detection, and a controlled lexicon- and token-guided rewriting component. A parallel corpus of toxic to neutral rewrites, which captures idiomatic usage, diacritics, and code switching, was developed to train and evaluate the model. The detection component achieved stratified K-fold accuracies of 61-72% (isiXhosa) and 72-86% (Yor\`ub\'a), with per-language ROC-AUCs up to 0.88. The rewriting component successfully detoxified all…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Topic Modeling · Text Readability and Simplification
