CSEval: Towards Automated, Multi-Dimensional, and Reference-Free   Counterspeech Evaluation using Auto-Calibrated LLMs

Amey Hengle; Aswini Kumar; Anil Bandhakavi; Tanmoy Chakraborty

arXiv:2501.17581·cs.CL·February 11, 2025

CSEval: Towards Automated, Multi-Dimensional, and Reference-Free Counterspeech Evaluation using Auto-Calibrated LLMs

Amey Hengle, Aswini Kumar, Anil Bandhakavi, Tanmoy Chakraborty

PDF

Open Access

TL;DR

This paper introduces CSEval, a comprehensive dataset and framework for evaluating counterspeech quality across multiple dimensions, and proposes Auto-CSEval, a prompt-based LLM method that better aligns with human judgement than traditional metrics.

Contribution

The paper presents a novel multi-dimensional counterspeech evaluation framework and a new auto-calibrated LLM-based scoring method that improves correlation with human assessments.

Findings

01

Auto-CSEval outperforms traditional metrics like ROUGE and BertScore.

02

The framework effectively captures contextual relevance, aggressiveness, argument-coherence, and suitability.

03

Experiments demonstrate improved alignment with human judgement.

Abstract

Counterspeech has emerged as a popular and effective strategy for combating online hate speech, sparking growing research interest in automating its generation using language models. However, the field still lacks standardised evaluation protocols and reliable automated evaluation metrics that align with human judgement. Current automatic evaluation methods, primarily based on similarity metrics, do not effectively capture the complex and independent attributes of counterspeech quality, such as contextual relevance, aggressiveness, or argumentative coherence. This has led to an increased dependency on labor-intensive human evaluations to assess automated counter-speech generation methods. To address these challenges, we introduce CSEval, a novel dataset and framework for evaluating counterspeech quality across four dimensions: contextual-relevance, aggressiveness, argument-coherence,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis

MethodsALIGN