Who Decides What Is Harmful? Content Moderation Policy Through A Multi-Agent Personalised Inference Framework
Ewelina Gajewska, Michal Wawer, Katarzyna Budzynska, Jaroslaw A. Chudziak

TL;DR
This paper introduces a multi-agent personalised inference framework using LLMs for content moderation, improving accuracy by up to 32% by tailoring decisions to individual user sensitivities.
Contribution
It presents a novel multi-agent architecture that personalizes content moderation, addressing subjective harm perception and enhancing policy compliance.
Findings
Up to 32% improvement in moderation accuracy over non-personalised baselines.
The framework aligns moderation with individual user sensitivities.
Provides scalable insights for platform governance and digital rights.
Abstract
The increasing scale and complexity of online platforms raises critical policy questions around harmful content, digital well-being, and user autonomy. Traditional content moderation systems rely on centralised, top-down rules, often failing to accommodate the subjective nature of harm perception. This paper proposes an LLM-based multi-agent personalised inference framework that filters content based on unique sensitivity profiles of individual users. Our architecture combines domain-specific Expert Agents, a Manager Agent for orchestrating content analysis and agent selection, and a Ghost Profile Agent for simulating user perspectives, to inform moderation decisions. Evaluated against a range of non-personalised baselines, the system demonstrates up to a 32% improvement in accuracy, showing increased alignment with individual user sensitivities. Beyond technical performance, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
