MoMoE: Mixture of Moderation Experts Framework for AI-Assisted Online Governance

Agam Goyal; Xianyang Zhan; Yilun Chen; Koustuv Saha; Eshwar Chandrasekharan

arXiv:2505.14483·cs.CL·October 24, 2025

MoMoE: Mixture of Moderation Experts Framework for AI-Assisted Online Governance

Agam Goyal, Xianyang Zhan, Yilun Chen, Koustuv Saha, Eshwar Chandrasekharan

PDF

Open Access 1 Video

TL;DR

MoMoE is a modular, explainable framework for online content moderation that scales across communities, matching or surpassing traditional models in accuracy while providing transparent explanations.

Contribution

Introduces MoMoE, a scalable, multi-expert framework with post-hoc explanations for cross-community content moderation, reducing the need for per-community fine-tuning.

Findings

01

Achieves high accuracy on unseen subreddits with expert ensembles.

02

Provides reliable, concise explanations for moderation decisions.

03

Demonstrates steady performance of norm-violation experts across domains.

Abstract

Large language models (LLMs) have shown great potential in flagging harmful content in online communities. Yet, existing approaches for moderation require a separate model for every community and are opaque in their decision-making, limiting real-world adoption. We introduce Mixture of Moderation Experts (MoMoE), a modular, cross-community framework that adds post-hoc explanations to scalable content moderation. MoMoE orchestrates four operators -- Allocate, Predict, Aggregate, Explain -- and is instantiated as seven community-specialized experts (MoMoE-Community) and five norm-violation experts (MoMoE-NormVio). On 30 unseen subreddits, the best variants obtain Micro-F1 scores of 0.72 and 0.67, respectively, matching or surpassing strong fine-tuned baselines while consistently producing concise and reliable explanations. Although community-specialized experts deliver the highest peak…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

MoMoE: Mixture of Moderation Experts Framework for AI-Assisted Online Governance· underline

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Misinformation and Its Impacts · Spam and Phishing Detection