Policy-as-Prompt: Rethinking Content Moderation in the Age of Large   Language Models

Konstantina Palla; Jos\'e Luis Redondo Garc\'ia; Claudia Hauff,; Francesco Fabbri; Henrik Lindstr\"om; Daniel R. Taber; Andreas Damianou,; Mounia Lalmas

arXiv:2502.18695·cs.CY·February 27, 2025

Policy-as-Prompt: Rethinking Content Moderation in the Age of Large Language Models

Konstantina Palla, Jos\'e Luis Redondo Garc\'ia, Claudia Hauff,, Francesco Fabbri, Henrik Lindstr\"om, Daniel R. Taber, Andreas Damianou,, Mounia Lalmas

PDF

Open Access

TL;DR

This paper introduces the policy-as-prompt framework, leveraging large language models for flexible, dynamic content moderation, and discusses key technical, sociotechnical, organizational, and governance challenges and solutions.

Contribution

It formalizes the policy-as-prompt approach and identifies critical challenges across multiple domains, guiding future scalable moderation systems.

Findings

01

Identified five key challenges in policy-as-prompt implementation.

02

Analyzed technical, sociotechnical, organizational, and governance issues.

03

Discussed mitigation strategies for effective LLM-based moderation.

Abstract

Content moderation plays a critical role in shaping safe and inclusive online environments, balancing platform standards, user expectations, and regulatory frameworks. Traditionally, this process involves operationalising policies into guidelines, which are then used by downstream human moderators for enforcement, or to further annotate datasets for training machine learning moderation models. However, recent advancements in large language models (LLMs) are transforming this landscape. These models can now interpret policies directly as textual inputs, eliminating the need for extensive data curation. This approach offers unprecedented flexibility, as moderation can be dynamically adjusted through natural language interactions. This paradigm shift raises important questions about how policies are operationalised and the implications for content moderation practices. In this paper, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Misinformation and Its Impacts · Spam and Phishing Detection