Improving Moderation of Online Discussions via Interpretable Neural Models
Andrej \v{S}vec, Mat\'u\v{s} Pikuliak, Mari\'an \v{S}imko, M\'aria, Bielikov\'a (Slovak University of Technology in Bratislava, Bratislava,, Slovakia)

TL;DR
This paper introduces an interpretable neural network approach to assist online discussion moderation by automatically detecting and highlighting inappropriate comments, thereby easing the burden on human moderators.
Contribution
It presents a novel two-step neural model that detects and highlights inappropriate comments, improving moderation efficiency and interpretability.
Findings
Effective detection of inappropriate comments on Slovak news platform
Highlights problematic parts within comments for faster moderation
Demonstrates potential to assist human moderators in online discussions
Abstract
Growing amount of comments make online discussions difficult to moderate by human moderators only. Antisocial behavior is a common occurrence that often discourages other users from participating in discussion. We propose a neural network based method that partially automates the moderation process. It consists of two steps. First, we detect inappropriate comments for moderators to see. Second, we highlight inappropriate parts within these comments to make the moderation faster. We evaluated our method on data from a major Slovak news discussion platform.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
