Can Language Model Moderators Improve the Health of Online Discourse?

Hyundong Cho; Shuai Liu; Taiwei Shi; Darpan Jain; Basem Rizk; Yuyang; Huang; Zixun Lu; Nuan Wen; Jonathan Gratch; Emilio Ferrara; Jonathan May

arXiv:2311.10781·cs.CL·May 7, 2024·1 cites

Can Language Model Moderators Improve the Health of Online Discourse?

Hyundong Cho, Shuai Liu, Taiwei Shi, Darpan Jain, Basem Rizk, Yuyang, Huang, Zixun Lu, Nuan Wen, Jonathan Gratch, Emilio Ferrara, Jonathan May

PDF

Open Access 1 Video

TL;DR

This paper explores the potential of language models to assist online community moderation, proposing a new evaluation framework and finding that models can give fair feedback but struggle to promote respectful behavior.

Contribution

It introduces a systematic evaluation framework for language models as moderators and provides the first study assessing their effectiveness in real moderation scenarios.

Findings

01

Models can give fair feedback on toxicity.

02

Models struggle to increase user respect and cooperation.

03

Evaluation framework enables realistic assessment of moderation capabilities.

Abstract

Conversational moderation of online communities is crucial to maintaining civility for a constructive environment, but it is challenging to scale and harmful to moderators. The inclusion of sophisticated natural language generation modules as a force multiplier to aid human moderators is a tantalizing prospect, but adequate evaluation approaches have so far been elusive. In this paper, we establish a systematic definition of conversational moderation effectiveness grounded on moderation literature and establish design criteria for conducting realistic yet safe evaluation. We then propose a comprehensive evaluation framework to assess models' moderation capabilities independently of human intervention. With our framework, we conduct the first known study of language models as conversational moderators, finding that appropriately prompted models that incorporate insights from social…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Can Language Model Moderators Improve the Health of Online Discourse?· underline

Taxonomy

TopicsHate Speech and Cyberbullying Detection

MethodsFocus