TL;DR
This paper introduces a novel LLM-powered simulation framework to evaluate online moderation strategies by modeling social behaviors and testing interventions in a controlled, counterfactual environment.
Contribution
It presents the first simulation tool leveraging LLMs for counterfactual evaluation of moderation, capturing social contagion and personalization effects.
Findings
LLM-based agents exhibit realistic social behaviors.
Social contagion phenomena emerge in simulations.
Personalized moderation strategies outperform generic ones.
Abstract
Online Social Networks (OSNs) widely adopt content moderation to mitigate the spread of abusive and toxic discourse. Nonetheless, the real effectiveness of moderation interventions remains unclear due to the high cost of data collection and limited experimental control. The latest developments in Natural Language Processing pave the way for a new evaluation approach. Large Language Models (LLMs) can be successfully leveraged to enhance Agent-Based Modeling and simulate human-like social behavior with unprecedented degree of believability. Yet, existing tools do not support simulation-based evaluation of moderation strategies. We fill this gap by designing a LLM-powered simulator of OSN conversations enabling a parallel, counterfactual simulation where toxic behavior is influenced by moderation interventions, keeping all else equal. We conduct extensive experiments, unveiling the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
