Evaluating the Effectiveness of Natural Language Inference for Hate   Speech Detection in Languages with Limited Labeled Data

Janis Goldzycher; Moritz Preisig; Chantal Amrhein; Gerold Schneider

arXiv:2306.03722·cs.CL·June 13, 2023·1 cites

Evaluating the Effectiveness of Natural Language Inference for Hate Speech Detection in Languages with Limited Labeled Data

Janis Goldzycher, Moritz Preisig, Chantal Amrhein, Gerold Schneider

PDF

Open Access 1 Repo

TL;DR

This study explores how natural language inference models can improve hate speech detection in low-resource languages, showing significant gains over traditional methods especially when domain mismatch occurs.

Contribution

The paper demonstrates the effectiveness of NLI models in low-resource hate speech detection and provides practical recommendations for such scenarios.

Findings

01

NLI fine-tuning outperforms direct fine-tuning in low-resource settings.

02

Intermediate fine-tuning on English data has limited effectiveness.

03

Customized NLI formulations can outperform intermediate fine-tuning when domain mismatch occurs.

Abstract

Most research on hate speech detection has focused on English where a sizeable amount of labeled training data is available. However, to expand hate speech detection into more languages, approaches that require minimal training data are needed. In this paper, we test whether natural language inference (NLI) models which perform well in zero- and few-shot settings can benefit hate speech detection performance in scenarios where only a limited amount of labeled data is available in the target language. Our evaluation on five languages demonstrates large performance improvements of NLI fine-tuning over direct fine-tuning in the target language. However, the effectiveness of previous work that proposed intermediate fine-tuning on English data is hard to match. Only in settings where the English training data does not match the test domain, can our customised NLI-formulation outperform…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jagol/xnli4xhsd
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Internet Traffic Analysis and Secure E-voting

MethodsTest