An Evaluation of Chat Safety Moderations in Roblox
Priya Kaushik, Sonja Brown, Rakibul Hasan, Sazzadur Rahaman

TL;DR
This study evaluates Roblox's automated chat moderation effectiveness by analyzing 2 million messages, revealing many unsafe messages evade detection and continue harmful behaviors.
Contribution
It introduces a large-scale analysis of chat moderation effectiveness using LLMs and manual coding, highlighting evasion tactics and safety gaps.
Findings
Numerous unsafe messages related to grooming, harassment, and violence escape moderation.
Users employ various techniques to evade detection and continue harmful messaging.
Current moderation systems are insufficient to prevent all unsafe communications.
Abstract
Roblox is among the most popular online gaming platforms, used by hundreds of millions of users every day. A substantial portion of these users are underage, who are at a greater risk, where abusive users may utilize Roblox's real-time chat interface to make the initial contact with potential victims. Roblox employs automated chat moderation mechanisms to detect potentially abusive messages; however, to date, their effectiveness has not been independently investigated. Toward this goal, we collected approximately 2 million chat messages from four games across multiple age groups and analyzed them to evaluate the moderation system. These messages were collected from public game servers following ethical and legal norms as well as Roblox's terms of service. We use this corpus to qualitatively study which types of unsafe chats escape the moderation system and how policy-violating users…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
