TL;DR
PluRule introduces a comprehensive benchmark for AI moderation in pluralistic social media communities, highlighting current models' struggles with nuanced, community-specific rule violations across multiple languages.
Contribution
The paper presents PluRule, a novel multimodal, multilingual benchmark for detecting diverse rule violations in social media communities, revealing limitations of current AI moderation models.
Findings
State-of-the-art models perform only slightly better than trivial baselines.
Larger models and more context yield marginal improvements.
Universal rules like civility are easier to detect.
Abstract
Social media are shifting towards pluralism -- community-governed platforms where groups define their own norms. What violates rules in one community may be perfectly acceptable in another. Can AI models help moderate such pluralistic communities? We formalize the task as a multiple-choice problem, mirroring how human moderators operate in the real world: given a comment and its surrounding context, identify which specific rule, if any, is violated. We introduce PluRule, a multimodal, multilingual benchmark for detecting 13,371 rule violations across 1,989 Reddit communities spanning 2,885 rules in 9 languages. Using this benchmark, we show that state-of-the-art vision-language models struggle significantly: even GPT-5.2 with high reasoning performs only slightly better than a trivial baseline. We also find that bigger models and increased context provide marginal gains, and universal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
