Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster
Agostina Calabrese, Leonardo Neves, Neil Shah, Maarten W. Bos, Bj\"orn, Ross, Mirella Lapata, Francesco Barbieri

TL;DR
This study investigates how different types of explanations from models impact social media moderators' speed in identifying hate speech, finding that structured explanations significantly reduce decision time.
Contribution
The paper demonstrates that structured explanations can effectively speed up real-world moderation decisions, a novel insight for AI-assisted content moderation.
Findings
Structured explanations reduce moderation decision time by 7.4%
Generic explanations are often ignored by moderators
No significant speed impact from generic explanations
Abstract
Content moderators play a key role in keeping the conversation on social media healthy. While the high volume of content they need to judge represents a bottleneck to the moderation pipeline, no studies have explored how models could support them to make faster decisions. There is, by now, a vast body of research into detecting hate speech, sometimes explicitly motivated by a desire to help improve content moderation, but published research using real content moderators is scarce. In this work we investigate the effect of explanations on the speed of real-world moderators. Our experiments show that while generic explanations do not affect their speed and are often ignored, structured explanations lower moderators' decision making time by 7.4%.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHate Speech and Cyberbullying Detection
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
