Can AI Moderate Online Communities?
Henrik Axelsen, Johannes Rude Jensen, Sebastian Axelsen, Valdemar, Licht, Omri Ross

TL;DR
This paper explores using large language models to automate moderation in online communities, demonstrating promising results in identifying toxic behavior and fostering positive interactions.
Contribution
It introduces a rapid development framework utilizing open-access LLMs for online community moderation, advancing practical applications in content management.
Findings
LLMs can effectively identify toxic comments and positive contributions.
Open-access models like GPT enable faster development of moderation tools.
Models perform well in both non-contextual and contextual moderation tasks.
Abstract
The task of cultivating healthy communication in online communities becomes increasingly urgent, as gaming and social media experiences become progressively more immersive and life-like. We approach the challenge of moderating online communities by training student models using a large language model (LLM). We use zero-shot learning models to distill and expand datasets followed by a few-shot learning and a fine-tuning approach, leveraging open-access generative pre-trained transformer models (GPT) from OpenAI. Our preliminary findings suggest, that when properly trained, LLMs can excel in identifying actor intentions, moderating toxic comments, and rewarding positive contributions. The student models perform above-expectation in non-contextual assignments such as identifying classically toxic behavior and perform sufficiently on contextual assignments such as identifying positive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Software Engineering Research
