Assessing the Human Likeness of AI-Generated Counterspeech
Xiaoying Song, Sujana Mamidisetty, Eduardo Blanco, Lingzi Hong

TL;DR
This study evaluates how human-like AI-generated counterspeech is compared to human-written responses, revealing that both humans and classifiers can distinguish between them, and highlighting differences in linguistic features.
Contribution
The paper introduces an evaluation of AI-generated counterspeech's human likeness, analyzing linguistic differences and providing a publicly available dataset for future research.
Findings
AI-generated counterspeech can be distinguished from human-written by classifiers and humans
Differences observed in linguistic characteristics, politeness, and specificity
The dataset used is publicly available for further research
Abstract
Counterspeech is a targeted response to counteract and challenge abusive or hateful content. It effectively curbs the spread of hatred and fosters constructive online communication. Previous studies have proposed different strategies for automatically generated counterspeech. Evaluations, however, focus on relevance, surface form, and other shallow linguistic characteristics. This paper investigates the human likeness of AI-generated counterspeech, a critical factor influencing effectiveness. We implement and evaluate several LLM-based generation strategies, and discover that AI-generated and human-written counterspeech can be easily distinguished by both simple classifiers and humans. Further, we reveal differences in linguistic characteristics, politeness, and specificity. The dataset used in this study is publicly available for further research.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI
MethodsFocus
