Explainability and Hate Speech: Structured Explanations Make Social   Media Moderators Faster

Agostina Calabrese; Leonardo Neves; Neil Shah; Maarten W. Bos; Bj\"orn; Ross; Mirella Lapata; Francesco Barbieri

arXiv:2406.04106·cs.CL·June 7, 2024

Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster

Agostina Calabrese, Leonardo Neves, Neil Shah, Maarten W. Bos, Bj\"orn, Ross, Mirella Lapata, Francesco Barbieri

PDF

Open Access 1 Repo 1 Video

TL;DR

This study investigates how different types of explanations from models impact social media moderators' speed in identifying hate speech, finding that structured explanations significantly reduce decision time.

Contribution

The paper demonstrates that structured explanations can effectively speed up real-world moderation decisions, a novel insight for AI-assisted content moderation.

Findings

01

Structured explanations reduce moderation decision time by 7.4%

02

Generic explanations are often ignored by moderators

03

No significant speed impact from generic explanations

Abstract

Content moderators play a key role in keeping the conversation on social media healthy. While the high volume of content they need to judge represents a bottleneck to the moderation pipeline, no studies have explored how models could support them to make faster decisions. There is, by now, a vast body of research into detecting hate speech, sometimes explicitly motivated by a desire to help improve content moderation, but published research using real content moderators is scarce. In this work we investigate the effect of explanations on the speed of real-world moderators. Our experiments show that while generic explanations do not affect their speed and are often ignored, structured explanations lower moderators' decision making time by 7.4%.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Ago3/structured_explanations_make_moderators_faster
noneOfficial

Videos

Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster· underline

Taxonomy

TopicsHate Speech and Cyberbullying Detection

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings