Culture Matters in Toxic Language Detection in Persian

Zahra Bokaei; Walid Magdy; Bonnie Webber

arXiv:2506.03458·cs.CL·June 5, 2025

Culture Matters in Toxic Language Detection in Persian

Zahra Bokaei, Walid Magdy, Bonnie Webber

PDF

Open Access 1 Video

TL;DR

This paper investigates how cultural context influences toxic language detection in Persian, comparing various machine learning methods and highlighting the importance of cultural similarity in transfer learning effectiveness.

Contribution

It introduces a comprehensive comparison of detection methods and emphasizes the role of cultural similarity in transfer learning for Persian toxic language detection.

Findings

01

Cultural similarity improves transfer learning performance

02

Fine-tuning and data enrichment enhance detection accuracy

03

Cross-lingual transfer varies with cultural proximity

Abstract

Toxic language detection is crucial for creating safer online environments and limiting the spread of harmful content. While toxic language detection has been under-explored in Persian, the current work compares different methods for this task, including fine-tuning, data enrichment, zero-shot and few-shot learning, and cross-lingual transfer learning. What is especially compelling is the impact of cultural context on transfer learning for this task: We show that the language of a country with cultural similarities to Persian yields better results in transfer learning. Conversely, the improvement is lower when the language comes from a culturally distinct country. Warning: This paper contains examples of toxic language that may disturb some readers. These examples are included for the purpose of research on toxic detection.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Culture Matters in Toxic Language Detection in Persian· underline

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Authorship Attribution and Profiling · Topic Modeling