Policy-as-Prompt: Rethinking Content Moderation in the Age of Large Language Models
Konstantina Palla, Jos\'e Luis Redondo Garc\'ia, Claudia Hauff,, Francesco Fabbri, Henrik Lindstr\"om, Daniel R. Taber, Andreas Damianou,, Mounia Lalmas

TL;DR
This paper introduces the policy-as-prompt framework, leveraging large language models for flexible, dynamic content moderation, and discusses key technical, sociotechnical, organizational, and governance challenges and solutions.
Contribution
It formalizes the policy-as-prompt approach and identifies critical challenges across multiple domains, guiding future scalable moderation systems.
Findings
Identified five key challenges in policy-as-prompt implementation.
Analyzed technical, sociotechnical, organizational, and governance issues.
Discussed mitigation strategies for effective LLM-based moderation.
Abstract
Content moderation plays a critical role in shaping safe and inclusive online environments, balancing platform standards, user expectations, and regulatory frameworks. Traditionally, this process involves operationalising policies into guidelines, which are then used by downstream human moderators for enforcement, or to further annotate datasets for training machine learning moderation models. However, recent advancements in large language models (LLMs) are transforming this landscape. These models can now interpret policies directly as textual inputs, eliminating the need for extensive data curation. This approach offers unprecedented flexibility, as moderation can be dynamically adjusted through natural language interactions. This paradigm shift raises important questions about how policies are operationalised and the implications for content moderation practices. In this paper, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Misinformation and Its Impacts · Spam and Phishing Detection
