Beyond One-Size-Fits-All: Personalized Harmful Content Detection with In-Context Learning

Rufan Zhang; Lin Zhang; Xianghang Mi

arXiv:2511.05532·cs.CL·November 11, 2025

Beyond One-Size-Fits-All: Personalized Harmful Content Detection with In-Context Learning

Rufan Zhang, Lin Zhang, Xianghang Mi

PDF

Open Access 1 Datasets

TL;DR

This paper introduces a personalized content moderation framework using in-context learning with foundation models, enabling adaptable, privacy-preserving detection of harmful online content across various categories with minimal user input.

Contribution

It presents a novel ICL-based approach for unified, personalized moderation that requires no retraining and adapts quickly with few examples or definitions.

Findings

01

Foundation models generalize well across tasks.

02

Personalization effective with one example.

03

Prompt augmentation improves robustness.

Abstract

The proliferation of harmful online content--e.g., toxicity, spam, and negative sentiment--demands robust and adaptable moderation systems. However, prevailing moderation systems are centralized and task-specific, offering limited transparency and neglecting diverse user preferences--an approach ill-suited for privacy-sensitive or decentralized environments. We propose a novel framework that leverages in-context learning (ICL) with foundation models to unify the detection of toxicity, spam, and negative sentiment across binary, multi-class, and multi-label settings. Crucially, our approach enables lightweight personalization, allowing users to easily block new categories, unblock existing ones, or extend detection to semantic variations through simple prompt-based interventions--all without model retraining. Extensive experiments on public benchmarks (TextDetox, UCI SMS, SST2) and a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

ChaseLabs/Harmful-Texts-On-Mastodon
dataset· 23 dl
23 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Spam and Phishing Detection · Misinformation and Its Impacts