Effectiveness of Counter-Speech against Abusive Content: A Multidimensional Annotation and Classification Study
Greta Damo, Elena Cabrio, Serena Villata

TL;DR
This study introduces a new computational framework with six linguistic dimensions to evaluate counter-speech effectiveness against online hate speech, supported by a large annotated dataset and high-performing classification strategies.
Contribution
It presents a novel multidimensional annotation scheme and classification methods for assessing counter-speech, along with a new linguistic resource for the community.
Findings
High classification accuracy (F1 scores 0.94 and 0.96) for counter-speech effectiveness.
Strong interdependence observed among the six defined dimensions.
A publicly available annotated dataset of 4,214 counter-speech instances.
Abstract
Counter-speech (CS) is a key strategy for mitigating online Hate Speech (HS), yet defining the criteria to assess its effectiveness remains an open challenge. We propose a novel computational framework for CS effectiveness classification, grounded in linguistics, communication and argumentation concepts. Our framework defines six core dimensions - Clarity, Evidence, Emotional Appeal, Rebuttal, Audience Adaptation, and Fairness - which we use to annotate 4,214 CS instances from two benchmark datasets, resulting in a novel linguistic resource released to the community. In addition, we propose two classification strategies, multi-task and dependency-based, achieving strong results (0.94 and 0.96 average F1 respectively on both expert- and user-written CS), outperforming standard baselines, and revealing strong interdependence among dimensions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
