GMP: A Benchmark for Content Moderation under Co-occurring Violations and Dynamic Rules

Houde Dong; Yifei She; Kai Ye; Liangcai Su; Chenxiong Qian; Jie Hao

arXiv:2603.01724·cs.AI·March 3, 2026

GMP: A Benchmark for Content Moderation under Co-occurring Violations and Dynamic Rules

Houde Dong, Yifei She, Kai Ye, Liangcai Su, Chenxiong Qian, Jie Hao

PDF

Open Access

TL;DR

This paper introduces GMP, a benchmark designed to evaluate AI content moderation systems' ability to handle multiple simultaneous violations and adapt to changing moderation rules, addressing limitations of current models in real-world scenarios.

Contribution

The paper presents GMP, a new benchmark that tests AI moderation under co-occurring violations and dynamic rules, highlighting gaps in existing evaluation methods.

Findings

01

Current LLMs struggle with co-occurring violations.

02

Dynamic rules significantly impact moderation accuracy.

03

GMP reveals limitations of existing models in complex scenarios.

Abstract

Online content moderation is essential for maintaining a healthy digital environment, and reliance on AI for this task continues to grow. Consider a user comment using national stereotypes to insult a politician. This example illustrates two critical challenges in real-world scenarios: (1) Co-occurring Violations, where a single post violates multiple policies (e.g., prejudice and personal attacks); (2) Dynamic rules of moderation, where determination of a violation depends on platform-specific guidelines that evolve across contexts . The intersection of co-occurring harms and dynamically changing rules highlights a core limitation of current AI systems: although large language models (LLMs) are adept at following fixed guidelines, their judgment capabilities degrade when policies are unstable or context-dependent . In practice, such shortcomings lead to inconsistent moderation: either…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Spam and Phishing Detection · Misinformation and Its Impacts