Human-in-the-Loop for Data Collection: a Multi-Target Counter Narrative   Dataset to Fight Online Hate Speech

Margherita Fanton; Helena Bonaldi; Serra Sinem Tekiroglu; Marco; Guerini

arXiv:2107.08720·cs.CL·September 21, 2021

Human-in-the-Loop for Data Collection: a Multi-Target Counter Narrative Dataset to Fight Online Hate Speech

Margherita Fanton, Helena Bonaldi, Serra Sinem Tekiroglu, Marco, Guerini

PDF

1 Repo

TL;DR

This paper introduces a human-in-the-loop methodology for creating high-quality, diverse hate speech and counter narrative datasets, resulting in the first expert-based multi-target HS/CN dataset to combat online hate speech.

Contribution

It presents a novel iterative data collection approach using generative models refined by experts, improving dataset quality and diversity for counter narrative generation.

Findings

01

Method is scalable and cost-effective.

02

Produces diverse and novel counter narratives.

03

Results in the only expert-based multi-target HS/CN dataset.

Abstract

Undermining the impact of hateful content with informed and non-aggressive responses, called counter narratives, has emerged as a possible solution for having healthier online communities. Thus, some NLP studies have started addressing the task of counter narrative generation. Although such studies have made an effort to build hate speech / counter narrative (HS/CN) datasets for neural generation, they fall short in reaching either high-quality and/or high-quantity. In this paper, we propose a novel human-in-the-loop data collection methodology in which a generative language model is refined iteratively by using its own data from the previous loops to generate new training samples that experts review and/or post-edit. Our experiments comprised several loops including dynamic variations. Results show that the methodology is scalable and facilitates diverse, novel, and cost-effective data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

marcoguerini/CONAN
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.